Vague instructions like "be conservative" or "only report high-confidence findings" sound reasonable but reliably fail to improve precision. Models can't calibrate confidence well enough for those phrases to be actionable. This lesson covers how to write categorical, criterion-based instructions that actually move the needle.
Why Vague Instructions Fail
- "Be conservative." What does conservative mean for this task? The model guesses. Different turns produce different definitions.
- "Only report high-confidence findings." The model's confidence is poorly calibrated. "High confidence" for one task means 70%+, for another 95%+. The phrase has no shared meaning.
- "Avoid false positives." Without specifying what counts as a false positive, the model can't enforce the rule.
Categorical Criteria Beat Confidence Phrasing
Instead of "be conservative," specify exactly what to flag and what to skip:
Flag a comment ONLY when the behavior it describes contradicts
the actual code behavior in the same function. Do NOT flag
formatting or style differences. Do NOT flag stale comments
that are merely outdated rather than actively misleading.
Now the model has a concrete rule. The category "contradicts behavior" is testable. "Conservative" isn't.
The False-Positive Trust Problem
If a category of finding has high false positives — say, 30% of "security risks" flagged are actually fine — those false positives undermine trust in the categories that ARE accurate. Reviewers stop reading because they assume each finding is noise.
The fix is two-step: temporarily disable the high-FP category while you improve its prompt; keep the accurate categories live. Don't let one bad category drag down the others' credibility.
Severity Criteria with Examples
If your task involves grading severity (security, performance, code quality), define severity levels with concrete code examples:
## Severity Levels
CRITICAL: Code that allows unauthenticated access to sensitive data.
Example: A route handler that reads user data without checking session.
HIGH: Code that could cause data corruption or financial loss.
Example: A SQL query without parameterization, exposing injection.
MEDIUM: Code that degrades performance noticeably under load.
Example: An N+1 query in a hot path.
LOW: Style or convention violations with no behavioral impact.
Example: Inconsistent naming.
The examples anchor the categories. Without examples, the model fills in the definitions, often inconsistently.
The Customer Support Example
A customer support agent has 55% first-contact resolution and is escalating simple cases. The fix is NOT "add confidence scores" or "detect customer sentiment" — those are vague proxies. The fix is explicit escalation criteria with few-shot examples: this is when to escalate, this is when not to, here are 3 examples of each.
Skills to Develop
- Replace vague directives ("conservative", "high-confidence") with categorical criteria stating what to flag and what to skip.
- Disable high-false-positive categories temporarily while improving their prompts; don't let bad categories tarnish good ones.
- Define severity levels with concrete code examples.