
PROJECT RESCUE
Why most AI projects fail at the integration boundary
The model works. The prompts work. The eval harness is green. The project still fails — because nobody owned the integration boundary between the AI service and the rest of the product, and that boundary turns out to be where reliability lives. This is the single most common failure pattern we see in mid-stage AI builds.
Where the boundary lives
Every production AI feature has at least one boundary: the line between code that calls the model and code that does everything else. In simple builds this is a single API call with a function around it. In real products, the boundary is broader: it includes authentication, rate limiting, retries, fallbacks, telemetry, output validation, and the UI behaviour while the model thinks.
Teams that frame the AI work as ‘the model layer’ and the product work as ‘everything else’ tend to end up with two strong middles and a thin boundary. The thin boundary is where users actually experience reliability, and it is where the project quietly fails.
The four failure modes at the boundary
First: timeout opacity. The model is slow today (5–15s tail latencies). The product UI shows a generic spinner. Users abandon. The fix is at the boundary: streaming responses, optimistic state, time-bounded fallbacks. Second: silent error pass-through. The model returns malformed JSON. The boundary doesn’t validate. The product crashes downstream and logs are unhelpful.
Third: rate-limit cascades. The product spikes traffic. The model API rate-limits. The boundary doesn’t backoff or queue. The product surfaces a 500 to the user. Fourth: context-window leakage. A long conversation hits the context limit. The boundary doesn’t summarise or truncate gracefully. The whole conversation breaks. All four are boundary failures, not model failures.
Auth, rate limits, fallbacks: the integration triad
Three things every AI integration boundary needs and most ship without. Auth: who is allowed to use this, with what budget, audited how. Skipping this is how teams discover three months later that one customer is generating 80% of their model spend. Rate limits: per-user and per-tenant, with explicit fairness, not just whatever the upstream provider enforces. Fallbacks: when the model is unavailable, the product does something reasonable — degrade to a simpler experience, return cached results, queue for later.
All three are unglamorous. None of them feel like AI work. All of them are essential, and all of them belong on the engineering plan from day one rather than ‘we’ll add it before launch.’ The ‘before launch’ version is always thinner than the version you would have built deliberately.
What we put on the boundary in our builds
Our default boundary stack on AI projects: a thin service that wraps the model call with auth, per-tenant rate limiting, structured input/output validation (we use schemas), retry with exponential backoff, dead-letter queue for systematic failures, full structured logging of inputs, outputs, and timings (PII-redacted), and a feature flag that lets us toggle the model off and route to a fallback path.
This is usually 1–2 weeks of work and it’s never fun. It also never gets cut from a build we run, because we’ve seen what happens when it’s missing. The economics are simple: a project rescued because of boundary failure costs 3–5× what it would have cost to do the boundary right the first time.
Already at this stage?
If your AI build is in production and the user experience is unreliable in ways your eval harness doesn’t catch, this is almost certainly where the gap is. The model is fine. The product is fine. The boundary is undercooked. Triage of an existing AI build to find boundary gaps is one of the most common engagement shapes we run.
Typical week-1 finding for these: 5–8 specific boundary gaps, each fixable in 1–3 days. Typical 6-week outcome: a previously-flaky AI feature with a usable, debuggable, monitorable production reliability profile. Project rescue is the service for this. Book a triage if you suspect this is your situation.
Production AI feature flaky?