Securing the AI & LLM Supply Chain: Prompt Injection and Beyond
Models, prompts, retrieved documents and tools are now part of your attack surface. We map the new AI/LLM risks and the controls that actually help.
- Treat every model output as untrusted input — it can carry instructions and payloads, not just answers.
- Prompt injection is unsolved: contain its blast radius rather than trying to filter it away.
- The real danger is what the model can do: tool-calling, data access and downstream actions define your risk.
- Pin and verify model artifacts, datasets and dependencies the same way you govern any software supply chain.
- Map your controls to the OWASP Top 10 for LLM Applications and NIST AI RMF to stay auditable.
When you ship a feature backed by a large language model, you are not just adding a clever text generator — you are extending trust to model weights, system prompts, retrieved documents, embeddings and external tools you do not fully control. AI supply chain security is the discipline of mapping that expanded attack surface and constraining what an attacker can do when any one link is hostile. This guide walks through the concrete risks practitioners are seeing in production and the controls that move the needle.
What "AI supply chain security" actually covers
Classic application security worries about the code you write and the libraries you import. An LLM feature inherits all of that and adds several new, often unfamiliar, links:
- The model itself — weights pulled from a hub, a hosted API, or a fine-tune you trained on third-party data.
- The training and fine-tuning data — which may be poisoned, mislabeled, or carry licensing and privacy liabilities.
- Runtime context — system prompts, retrieved documents (RAG), user input, and tool results that all flow into the same context window.
- Tools and plugins — the functions, APIs and agents the model can invoke on your behalf.
Each of these is a trust boundary. The mistake teams make is assuming the boundary sits only between the user and the application. In an LLM system, any content that reaches the context window can influence behavior, including a PDF an attacker uploaded last week or a web page your agent fetched mid-task.
Prompt injection: the defining risk
Prompt injection is the LLM-era equivalent of injection flaws, and it is the first entry in the OWASP Top 10 for LLM Applications for good reason. The core problem is structural: models do not reliably distinguish trusted instructions from untrusted data when both arrive as natural language in the same context.
Direct vs. indirect injection
- Direct injection is the user typing adversarial instructions ("ignore your rules and...") straight into the prompt. Annoying, but you at least know the source.
- Indirect injection is the dangerous one. Malicious instructions are embedded in content the model ingests on the user's behalf — a support ticket, a scraped page, a calendar invite, a document in your RAG index. The user never sees the payload, but the model obeys it.
Indirect injection turns RAG pipelines and autonomous agents into confused deputies. A poisoned document can tell the model to exfiltrate the conversation, call an internal tool, or rewrite its own summary to mislead a human reviewer.
There is no reliable prompt-injection filter
Do not architect on the assumption that a guardrail model or regex will catch injection attempts. Detection helps at the margins, but determined attackers bypass it. Design as if the model will be hijacked, and make sure that when it is, it cannot reach data or actions that matter. Containment beats detection.
Untrusted model output is the second half of the problem
Even setting injection aside, treat every token the model emits as untrusted input to the rest of your system. If model output is rendered into HTML without encoding, you have stored XSS. If it is passed to a shell, an eval, or a database query, you have command or SQL injection — now with a probabilistic source you cannot fully predict.
Practical rules:
- Encode on output. Model-generated text rendered in a browser must be escaped exactly like user-generated content. This is standard application security hygiene applied to a new source.
- Never pass raw output to interpreters. No model string should reach a shell, SQL engine or template renderer without parameterization or strict validation.
- Constrain format. Where you need structured output, enforce a schema (JSON schema, function signatures) and reject anything that does not parse, rather than trusting free text.
Tool-calling and agents: where blast radius is decided
A chatbot that only talks is low risk. The moment you give a model tools — the ability to send email, query a database, run code, or call internal APIs — its mistakes and its hijacked instructions become real-world actions. This is where most of your actual exposure lives.
Controls that matter:
- Least privilege per tool. Each function the model can call should run with the narrowest scope possible. A "read customer record" tool should never be able to write or delete.
- Human-in-the-loop for sensitive actions. Irreversible or high-impact operations (payments, deletions, outbound communications) should require explicit confirmation outside the model's control.
- Deterministic authorization. Enforce access control in your own code, keyed to the authenticated user — not to whatever identity the model claims it is acting as. The model is not an authorization boundary.
- Isolate untrusted data from privileged tools. If a session has ingested external content, treat it as tainted and restrict which tools remain available. Patterns like dual-LLM or capability gating help here.
For agentic systems, it helps to model attacker techniques explicitly. Frameworks such as MITRE ATLAS catalogue adversarial behaviors against AI systems and give you a vocabulary for threat modeling beyond generic appsec.
Securing the artifacts: models, data and dependencies
The "supply chain" in AI supply chain security is not a metaphor. Model artifacts arrive through the same risky channels as any dependency:
- Pin and verify model provenance. Reference models by immutable digest, not a mutable tag. Prefer signed artifacts and verify checksums. Avoid loading weights in formats that allow arbitrary code execution during deserialization; prefer safe serialization formats.
- Govern training and fine-tuning data. Data poisoning can implant backdoors or bias that surface only on specific triggers. Track dataset lineage, validate sources, and keep a record of what went into each model version.
- Generate an AI-aware SBOM. Extend your software bill of materials to include models, datasets and the ML libraries around them, so you can answer "are we affected?" when a vulnerable component is disclosed.
- Test before you trust. Subject AI features to adversarial evaluation and penetration testing that specifically targets injection, jailbreaks, data leakage and tool abuse — not just classic web flaws.
Map controls to recognized frameworks
You do not need to invent a methodology. Align your program to references your auditors and partners already accept:
- OWASP Top 10 for LLM Applications — a prioritized risk list (prompt injection, insecure output handling, supply chain, excessive agency, sensitive information disclosure) that maps cleanly to engineering tasks.
- NIST AI Risk Management Framework (AI RMF) — for governance, accountability and lifecycle risk.
- MITRE ATLAS — adversary tactics and techniques specific to AI systems, useful for red-team scoping.
- ISO/IEC 42001 — for organizations building an auditable AI management system.
Use these to structure reviews and to demonstrate diligence, but remember the list is a floor, not a ceiling.
Conclusion
LLM features expand your attack surface in ways traditional appsec checklists miss: untrusted instructions arrive as data, model output is itself untrusted, and tool access converts model mistakes into real actions. The durable strategy is not to chase a perfect prompt filter but to contain blast radius — least privilege, deterministic authorization, output encoding, verified artifacts and adversarial testing. Get those right and a hijacked prompt becomes an annoyance instead of an incident.
If you are putting AI features into production and want a clear-eyed assessment of where they could break, talk to our team — we will help you threat-model the pipeline and harden it before attackers do.
Written by
Luca Romano
AppSec & AI Security Engineer · OSWE
Secures application and AI/LLM supply chains. Maintainer of several SAST rulesets.
Plus d'articles du blog
What Is Penetration Testing? Types, Process & Benefits (2026 Guide)
A complete, practitioner-led guide to penetration testing in 2026: the main types, the five-phase process, what a strong report looks like, and how to choose a provider.
VAPT vs Penetration Testing: What's the Difference?
VAPT and penetration testing are often confused. Here is exactly how they differ, when to use each, and how to combine them into one effective security program.
Zero Trust Architecture: A Pragmatic Guide for 2026
Zero Trust is sold as a product but it is an architecture. Here is a realistic, identity-first roadmap to implement it without rebuilding your network overnight.