Abstract neon lines forming a digital landscape on a dark background

AI IMPLEMENTATION

When NOT to add AI to your product

Some product teams spend a quarter adding AI to a feature that didn’t need it. The result is reliably the same: a slower, more expensive, less reliable version of what worked before, with a complicated explanation in the release notes. Here are the four tests we run with clients before any AI implementation work starts.

Test 1: would a sufficiently smart human do this?

If the answer is no — the workflow is purely mechanical, deterministic, and well-defined — you do not have an AI problem. You have an automation problem, and AI is the wrong tool. A spreadsheet macro, a database trigger, or a 50-line script will outperform any model on a deterministic task. It will also be cheaper, faster, easier to debug, and not subject to model drift.

If the answer is yes — a smart human would interpret, summarise, judge, or improvise — then AI is at least a candidate. But notice that this is the floor, not the ceiling. Many tasks pass test 1 and fail test 2.

Test 2: is the input domain bounded?

AI works well when the inputs are constrained: emails, support tickets, product reviews, legal documents in one jurisdiction. AI fails badly when the input domain is everything: ‘users can ask anything.’ Open-ended chatbots without scope are notoriously bad products, regardless of model quality, because the failure surface is infinite.

If you cannot describe the input domain in 30 seconds — ‘support tickets in English about our SaaS product’ is fine; ‘general business questions’ is not — you have not bounded the problem enough to ship reliably. Either narrow it, or don’t build it. Many AI products fail at this stage and ignore the warning.

Test 3: what does ‘wrong’ cost?

An AI that’s wrong 5% of the time is fine for some tasks (suggesting an email draft) and catastrophic for others (deciding whether to grant credit). Map the cost-of-wrong before you build. If a wrong answer costs the user time, that’s tolerable. If it costs the user money, you need a confirmation step. If it costs them legal exposure, a regulatory complaint, or harm to a third party, you probably shouldn’t be building it as an AI feature at all.

We’ve turned down rescue engagements where the cost-of-wrong is too high for any model to justify. If you’re in a regulated sector — financial services, healthcare, legal — this test is the most important one and the easiest to skip in the rush to ship.

Test 4: are you adding AI for the press release?

There is a category of feature that exists only because the company committed publicly to having AI in the product. These features are easy to spot: they’re not driven by a user problem, they don’t have a clear metric, and the team can’t explain how the AI version is better than the non-AI version they replaced. If you find yourself building one, the honest move is to scope it down to a single workflow where AI demonstrably wins, and ship that.

Every quarter we have one or two clients ask us to help them quietly retire an AI feature that nobody is using. There’s no shame in this — it’s the right call. But it would have been the cheaper call to not build it in the first place.

If you’ve already started — what to do

If you’ve passed the point of no return on a feature that fails one or more of these tests, the question is not ‘should we have built this?’ It’s ‘what’s the minimum delta to make it useful?’ Sometimes the answer is to narrow scope dramatically — a chatbot becomes a copilot for one specific workflow. Sometimes the answer is to remove the AI and keep the underlying improvement (better data, better UI). Sometimes it’s to ship and let usage data decide.

We do this kind of triage often. If you’re three months into a build and quietly worried about whether it should ship, a rescue triage week will surface the real options. Honest input, no upsell.

Mid-build doubt?

We run honest 30-minute triage calls on AI features in flight.