awesome-prompt-injection-defense

Awesome Prompt Injection Defense

A curated list of tools, papers, datasets, and resources for defending Large Language Models against prompt injection and indirect prompt injection attacks.

Prompt injection is the leading security risk in production LLM systems. The defense ecosystem is fragmented across academic preprints, vendor blogs, npm/PyPI utilities, and ad-hoc system prompts. This list is an attempt to bring it together in one place.

Detection libraries
RAG-specific guardrails
Evaluation datasets
Live demos
GitHub Actions for CI
Research papers and preprints
Adjacent reliability tooling
Background reading
Contributing

Detection libraries

Drop-in checks you call before passing untrusted text to an LLM.

prompt-injection-shield (npm) - Zero-dep detector for classic override, URL exfiltration, system-prompt impersonation, and tool-call hijack patterns.
prompt-injection-shield-py (PyPI) - Python port of the above with the same rule set.
Rebuff - Self-hardening prompt injection detector, originally by ProtectAI.
LLM Guard - Comprehensive LLM input/output security suite, includes prompt injection scanner.
PromptArmor - Hosted prompt injection detection API.
Lakera Guard - Commercial guardrail with a generous free tier.

RAG-specific guardrails

Prompt injection in RAG often hides inside retrieved documents (indirect injection) or poisoned vectors.

vector-poison-score (npm) / vector-poison-score-py (PyPI) - Heuristic score for whether a retrieved chunk has been poisoned.
rag-guardrails-action - GitHub Action that runs both detectors over a fixtures directory in CI.
LangChain Guardrails - Built-in chain-of-trust patterns for LangChain-based RAG.
LlamaIndex SafetyToolkit - Evaluation hooks usable as guardrails.

Evaluation datasets

Labeled corpora for benchmarking detectors.

prompt-injection-eval - 74 hand-curated rows across 9 categories (classic override, URL exfil, system impersonation, tool hijack, role override, encoded, indirect RAG poison, borderline, benign). MIT.
deepset/prompt-injections - Larger English/German prompt injection corpus.
JailbreakBench - Standardised benchmark for jailbreaks (related but broader category).
Lakera Gandalf prompts - Real attack prompts collected from the Gandalf game.

Live demos

Try detectors in the browser.

Prompt Injection Shield Demo - Streamlit app on Hugging Face Spaces, paste a passage and see severity-scored hits.
Lakera Gandalf - Interactive game for designing prompts that bypass guardrails.

GitHub Actions for CI

Plug into pull-request flows.

rag-guardrails-action - Composite Action wrapping prompt-injection-shield + vector-poison-score. Fail on high severity, warn on medium.

Research papers and preprints

Small-Rule Guardrails for Retrieval-Augmented Generation - Mukunda Rao Katta. Zenodo + Figshare DOI. Companion to prompt-injection-shield.
Greshake et al., Not what you’ve signed up for - Foundational paper on indirect prompt injection.
Perez and Ribeiro, Ignore Previous Prompt - Early study of prompt injection.
Liu et al., Prompt Injection Attacks and Defenses in LLM-Integrated Applications - Systematisation of attack and defense space.

Adjacent reliability tooling

Not strictly prompt-injection but commonly composed with it.

agentvet - Validate LLM tool-call arguments before execution.
agentguard - Network egress firewall for tool-using agents.
agentcast - Validate-and-retry loop for structured outputs.
agentsnap - Snapshot tests for tool-call traces.
agentfit - Token-aware message truncation.

Background reading

OWASP LLM Top 10: LLM01 Prompt Injection - The community baseline.
Simon Willison: Prompt injection - Long-running blog series with field reports.
NIST AI Risk Management Framework - Government view on LLM risk categories.

Contributing

Send a pull request. Each entry should:

Already exist (no vaporware or roadmaps).
Be free or have a meaningful free tier.
Have a one-line description of what makes it useful, not just a name.

Sort within each section alphabetically, except where the order is meaningful.

License

To the extent possible under law, the maintainer has waived all copyright and related rights to this list.