Batch Processing & Multi-Pass Review

Two architectural choices affect cost and quality at scale: synchronous vs Message Batches API, and single-pass vs multi-pass review for large PRs. This lesson covers when to use each.

Sync vs Batches: Match the API to the SLA

API	Use for	Notes
Synchronous	Blocking pre-merge checks, real-time UX	Full latency SLA, full price
Message Batches	Overnight reports, weekly audits, nightly test generation	50% cost savings, up to 24h processing, no latency SLA

The simple rule: if a human is blocked waiting for the result, use synchronous. If the result is consumed asynchronously (a report generated overnight, a nightly audit), use batches.

Batches API Constraints

No multi-turn tool calling within a single request. Each request is one round-trip. If your task needs the model to call tools and continue, batches isn't the right API.
Always set custom_id on each request. The custom_id is your correlation key — without it, you can't match results to inputs, and you can't resubmit specific failures without resubmitting the whole batch.
Refine prompts on a sample first. Submitting 10,000 batched requests with a flawed prompt wastes money and time. Test with 100 first.

SLA Calculation Example

You need to guarantee a 30-hour SLA on report generation. Batches process up to 24 hours. To stay within the 30-hour SLA, submit every 4 hours maximum: that gives you up to 24 hours of processing + 6 hours of buffer for retries and downstream consumers. Never submit so close to the SLA that a slow batch eats your buffer.

Pre-Merge Blocking Checks Are Synchronous

If your CI gates merging on a Claude review, that's blocking — synchronous API. Putting it on batches would make every PR wait up to 24 hours for merge clearance. Unworkable.

Multi-Pass Review for Large PRs

Self-review limitation: if Claude generates code and immediately reviews it, the model retains its reasoning context — it's less likely to question its own decisions. Multi-pass with separate model contexts addresses this.

For PRs with 14+ files, single-pass review is shallow. Attention spreads across all files; each file gets a superficial look. The fix is two passes:

Pass 1: per-file local analysis. Each file reviewed in its own context. Depth and consistency improve.
Pass 2: cross-file integration analysis. Data flow, shared state, contradictions in approach across files.

Larger Context Windows Don't Fix Attention Quality

It's tempting to assume that if all 14 files fit in context, single-pass works. They do fit. The review is still shallow. Attention quality degrades before context capacity does. The fix is workflow (multiple passes), not capacity (bigger window).

Skills to Develop

Match API to SLA: synchronous for blocking, batches for async.
Set custom_id on every batch request for correlation.
Refine prompts on a sample before batching at scale.
Calculate SLA windows: batch frequency < total SLA − 24h processing window.
Use multi-pass review (per-file + integration) for large PRs.

Exam tip (Q11/Q12): Pre-merge checks are synchronous (human blocked). Overnight reports are batch (no SLA pressure). For 14-file PRs with shallow review, split into per-file + integration passes — bigger context windows don't fix attention quality.