Prompt injection didn’t die—it evolved into something far more insidious.
2026 Attack Vectors
Indirect Injection via Retrieval
- RAG systems are now the primary attack surface. Malicious documents uploaded to shared knowledge bases execute arbitrary code when retrieved.
- Technique: “Trojan documents” that look benign but contain carefully crafted token sequences that hijack the model’s tool-calling behavior.
Tool Poisoning Attacks
- MCP (Model Context Protocol) servers and custom tools are being backdoored at the registry level.
- A single compromised tool can exfiltrate conversation history or execute shell commands on the host when the agent decides to “use” it.
Cross-Platform Persistence
- Attacks that survive across sessions by poisoning user preference vectors or long-term memory stores.
- One documented case: A finance agent was tricked into authorizing $2.3M wire transfers over 6 weeks via gradual preference manipulation.
Defense Strategies That Actually Work
- Deterministic Output Guardrails at the inference layer (not just system prompts)
- Tool Sandboxing with strict capability boundaries and human-in-the-loop for high-risk actions
- Provenance Tracking for every retrieved context chunk
- Adversarial Training using the exact techniques from red team reports above
The butchers who survive are those who treat every external input as potentially hostile code.
Stay sharp. The models are watching.