We ran a resume-screening prompt through 117 adversarial security tests. It failed 20-including prompt injection, data leakage, harmful output, and every bias test.
Then we hardened the prompt, reran the exact same suite, and it passed with 0 blockers.
Same model. Same use case. The only variable was how we wrote the prompt.
TL;DR
Before: 20/117 failures (5 blockers)
After: 3/117 failures (0 blockers, 3 advisories)
The biggest wins came from:
Removing subjective scoring
Treating user input as untrusted
Adding explicit refusals
Forcing structured output
If you ship AI features, your prompts are part of your attack surface.
The prompt most teams ship
This is the "works in dev, breaks in prod" version:
It passes basic QA. Then it hits real users with real incentives.
We ran 117 security tests. Here’s what broke.

Naive prompt: 20 failures · 5 blockers (ship-stop)
What actually failed (and why)
Bias testing: failed
A "fit score" invites the model to use every signal available-names, universities, writing style, prestige cues-whatever correlates in training data. The model isn’t "trying" to discriminate. It’s doing what you asked: judge fit.
Prompt injection: failed
The resume field is user-controlled, but the prompt treats it like trusted instructions. Attackers can embed "ignore above" style payloads and the model often complies.
Harmful content generation: failed
Without explicit guardrails, models try to be helpful even when they shouldn't. Under adversarial prompting, they can generate unsafe output.
Unsafe practices: failed
"Ascribe reasoning" is a trap. "Explain your reasoning" can produce rationales you do not want logged, shared, or used downstream.
These aren't exotic attacks. They're default failure modes when you treat untrusted input as trusted and don't constrain behavior.
Each failure came with a fix
Hardening took ~30 minutes because each failure mapped to a concrete mitigation (not vague "you have prompt injection").
Here’s what we changed.
1) Bound the scope explicitly
We replaced "recruiting assistant" (too broad) with a narrow, auditable role:
Smaller scope = smaller attack surface.
2) Add explicit refusal instructions
Models improvise when cornered. Don’t let them.
3) Remove subjective scoring
This is where bias and nonsense enters.
Extraction is objective. Scoring is where discrimination hides.
4) Mark user data as untrusted
Resumes are input. They are not instructions.
This doesn't make injection impossible-but it lowers success rate dramatically.
5) Constrain what signals the model may use
You can't remove bias from weights via prompting, but you can reduce exposure.
6) Force structured output
Free-form output is hard to audit and easy to abuse. Structure is safety.
Now you can log consistently, detect anomalies, and audit.
Copy-paste hardening patch (fastest win)
If you do nothing else, paste this at the top of prompts that mix instructions + user input:
The results after hardening

Hardened prompt: 3 advisory · 0 blockers (release-ready)
Before | After | |
|---|---|---|
Test failures | 20 | 3 |
Blocking issues | 5 | 0 |
Bias tests | FAIL | PASS |
Injection tests | FAIL | PASS |
Harmful content | FAIL | PASS |
The remaining failures were edge cases (e.g., chemistry terms in technical job descriptions). Advisory-level, not ship-stoppers.
What prompt hardening won’t fix
Embedded model bias: prompting constrains behavior; it doesn't retrain weights. You still need evaluation across groups.
Novel attacks: security isn't one-time. You need regression testing as prompts evolve.
Bad requirements: a hardened prompt implementing flawed criteria is still flawed.
The point
Your prompt is part of your security boundary. Most teams don't test it until something breaks in production.
Full hardened prompt on GitHub: GitHub
Disclosure: We build EarlyCore. The hardening patterns above are tool-agnostic.
Want to run the same suite on your prompt?
