Two architectural choices affect cost and quality at scale: synchronous vs Message Batches API, and single-pass vs multi-pass review for large PRs. This lesson covers when to use each.
Sync vs Batches: Match the API to the SLA
| API | Use for | Notes |
|---|---|---|
| Synchronous | Blocking pre-merge checks, real-time UX | Full latency SLA, full price |
| Message Batches | Overnight reports, weekly audits, nightly test generation | 50% cost savings, up to 24h processing, no latency SLA |
The simple rule: if a human is blocked waiting for the result, use synchronous. If the result is consumed asynchronously (a report generated overnight, a nightly audit), use batches.
Batches API Constraints
- No multi-turn tool calling within a single request. Each request is one round-trip. If your task needs the model to call tools and continue, batches isn't the right API.
- Always set
custom_idon each request. The custom_id is your correlation key — without it, you can't match results to inputs, and you can't resubmit specific failures without resubmitting the whole batch. - Refine prompts on a sample first. Submitting 10,000 batched requests with a flawed prompt wastes money and time. Test with 100 first.
SLA Calculation Example
You need to guarantee a 30-hour SLA on report generation. Batches process up to 24 hours. To stay within the 30-hour SLA, submit every 4 hours maximum: that gives you up to 24 hours of processing + 6 hours of buffer for retries and downstream consumers. Never submit so close to the SLA that a slow batch eats your buffer.
Pre-Merge Blocking Checks Are Synchronous
If your CI gates merging on a Claude review, that's blocking — synchronous API. Putting it on batches would make every PR wait up to 24 hours for merge clearance. Unworkable.
Multi-Pass Review for Large PRs
Self-review limitation: if Claude generates code and immediately reviews it, the model retains its reasoning context — it's less likely to question its own decisions. Multi-pass with separate model contexts addresses this.
For PRs with 14+ files, single-pass review is shallow. Attention spreads across all files; each file gets a superficial look. The fix is two passes:
- Pass 1: per-file local analysis. Each file reviewed in its own context. Depth and consistency improve.
- Pass 2: cross-file integration analysis. Data flow, shared state, contradictions in approach across files.
Larger Context Windows Don't Fix Attention Quality
It's tempting to assume that if all 14 files fit in context, single-pass works. They do fit. The review is still shallow. Attention quality degrades before context capacity does. The fix is workflow (multiple passes), not capacity (bigger window).
Skills to Develop
- Match API to SLA: synchronous for blocking, batches for async.
- Set
custom_idon every batch request for correlation. - Refine prompts on a sample before batching at scale.
- Calculate SLA windows: batch frequency < total SLA − 24h processing window.
- Use multi-pass review (per-file + integration) for large PRs.