When to add AI to your product, and when not to

Every week, someone asks me the same thing in a slightly different way. “Should we add AI to our product?” “How do we tell a better AI story to investors?” “What’s the AI version of our feature?” The honest answer is usually: you’re asking the wrong question.

This is a practical take on when AI actually belongs in your product, when it does not, what adding it costs, and how to tell the difference. Written in 2026, when the gap between “AI hype” and “AI that moves numbers” has never been wider.

The question you should be asking

The question is not “should we add AI?”. The question is: is there a task inside our product today where the cost or speed of doing it is a real constraint on our business?

That is the entire decision framework in one sentence. If the answer is yes, AI might be the cheapest way to change that cost or speed. If the answer is no, adding AI is either an expensive feature nobody asked for, or a pitch-deck signal nobody will remember in six months.

Every AI feature I have seen work well solves a task that was already being done, usually by humans, usually expensively. Every AI feature I have seen fail was added because “AI” was the brief, not because a cost or speed problem needed solving.

Five good reasons to add AI

1. Your users are already doing the task manually, at scale

If users of your product are already classifying, summarising, drafting, or matching things by hand, they are telling you where AI belongs. Look at support tickets, look at power-user workflows, look at the feature requests that start with “it would be great if the app could just…”.

Concrete example: a B2B SaaS client had users manually tagging uploaded documents by type. Average time per document: forty seconds. Volume per enterprise customer: two thousand documents a month. That is twenty-two hours of manual work per customer per month. An LLM-powered auto-tagging feature paid for itself inside the first billing cycle.

2. You have clear cost-per-task economics

If you know what a task costs today (staff time, outsourced work, human-in-the-loop review) and you can compare it to the cost of doing the same task via an LLM, the decision is arithmetic. Modern LLM pricing for most practical tasks sits between £0.001 and £0.05 per call. If the human cost is £5 and the AI cost is £0.01, you have a viable case.

If you cannot write down the current cost of the task, you are not ready to add AI. You are ready to measure first.

3. Users are asking for it in specific, repeated ways

Not “we want AI”. Actual requests in support tickets that describe the task. “Can this auto-suggest based on what I typed last time?” “Is there a way to summarise these into one paragraph?” “Can it just fill this out from the document I uploaded?”. When the requests describe an outcome, not a technology, that is a real signal.

4. You have a real bottleneck in your core funnel

If the step most users drop off at is “user has to type something hard” or “user has to make a judgement call with too little information”, AI can sometimes remove the friction. Think: onboarding that needs the user to fill in long descriptions, or a form that asks for a summary of something already provided elsewhere.

This is the class of AI feature that moves conversion. Measure the funnel, find the biggest drop, ask if AI removes the reason.

5. You are staring at an internal cost your customer will never see

Some of the best AI deployments are not user-facing features. They are internal: auto-triage of support tickets, auto-generation of first-draft sales emails, classification of incoming leads, summarisation of long customer calls for the team. Nobody outside the business notices, but the business gets cheaper to run.

These usually have the strongest ROI because you are not trying to change user behaviour. You are just paying less per task.

Five bad reasons to add AI

1. Your investor asked about your AI strategy

The answer to “what’s your AI strategy?” can be “we don’t have one yet because we haven’t found a problem where it moves the numbers”. Investors who have been in the industry for more than two years respect that answer more than they respect a feature built to answer the question.

If they do not respect that answer, it's a bit of a red flag on their behalf.

2. A competitor added it

Half the AI features shipped in 2024 and 2025 were built because someone else shipped one first. A large number of those features were quietly turned off in 2026. Shipping something because a competitor did is never a reason. Check whether their AI feature actually moved anything before copying it.

3. It sounds good in a pitch deck

If the only place the AI feature is mentioned is in slides, not in your product roadmap scored against other work, you are building deck-ware. Deck-ware is expensive, because it still has to be maintained, and it still has to be live when someone from the deck opens your app.

4. You want to do something cool

Everyone in engineering wants to work with LLMs right now. The enthusiasm is real. But “I want to try this” is not a product decision, it is a skills decision. Do your AI experiments in a side project or a hack week, not in the main codebase.

5. Someone sold you on “AI-first”

If an agency, a consultant, or a vendor has told you your entire product needs to be “re-imagined with AI at the core”, you are being sold a rewrite they would like to invoice. Very few products need to be re-imagined. Most need one or two well-placed AI features in the right spots, if any!

Three common failure modes

If you have spent any time around AI product launches in the last two years, you have seen these. They are the patterns that keep recurring.

The ChatGPT wrapper

A chat box is bolted onto the product. The chat has no specific grounding, no access to the user’s data, and no clear purpose. Usage is high in the first week, collapses by week four. The team ships it because chat is “what users expect now”. Users expected a useful assistant, not a worse version of ChatGPT inside your app.

A chat interface is only useful if it has context the user cannot easily type themselves and produces an output the user cannot easily produce themselves. Almost nothing meets both tests.

The search that got worse

Normal keyword search, replaced with “AI search” that returns an LLM-generated answer. Works well for ambiguous queries. Fails badly for queries where the user already knew the exact name of what they wanted. The replacement removes a tool that worked to add one that sometimes works.

If you do this, you need to keep both, or let users toggle, or be very sure your usage is skewed to fuzzy queries. Most products that did this without measuring first regressed their conversion.

The summary nobody asked for

Long-form content gets an auto-generated “TL;DR” at the top. Users report it as slightly helpful. Usage is low. The team counts it as a win because engagement is up, without checking whether the content itself got more or less read. Classic vanity-metric territory.

Summaries work when users are triaging a lot of content at speed (inbox, feed, ticket queue). They rarely add value when users already clicked through to read something specific.

What AI integration actually costs

Four real cost categories, in rough order of how often teams forget them:

Engineering time to integrate. A well-scoped LLM feature is usually two to six weeks of engineering work. A RAG pipeline (retrieval-augmented generation with your own data) is usually four to ten weeks. Agent-based workflows are more. Plan for the integration testing and the prompt tuning, which often takes longer than the first build.

Per-call inference cost. £0.001 to £0.05 per call for most LLM tasks via hosted APIs in 2026. Sounds small, multiplies fast. A feature that triggers on every user action can cost more per month than the rest of your infrastructure combined. Model this before shipping.

Evaluation and guardrails. You need a way to measure whether the AI output is good. That means building evals, monitoring quality over time, and adding guardrails (content filtering, hallucination detection, cost caps). Most teams skip this initially and regret it. Budget thirty percent of the feature build for this, not zero.

Ongoing maintenance. Models change. Pricing changes. Your prompts rot as user behaviour evolves. AI features need more maintenance than a standard CRUD feature, not less. Expect fifteen to twenty percent of the initial build cost as annual maintenance.

A practical decision framework

Before writing a line of code, answer these four questions in writing:

What is the task? Name it in a sentence. “Users tag uploaded documents by type.” Not “improve document management”.
What does it cost today? In time, money, or conversion. If you cannot answer, measure for two weeks before deciding.
What would AI do to that cost? Divide cost-per-task-today by expected cost-per-task-with-AI. If the answer is less than 5x, the feature probably is not worth the build.
What happens when it is wrong? AI outputs are probabilistic. Can the user catch and fix errors, or does a bad output cause real harm (financial, legal, reputational)? Higher-stakes outputs need more guardrails, which means higher build cost.

If all four answers hold up, build the thing. If any one breaks, the feature is probably not ready yet.

One more thing. “AI” is a category, not a product decision. The question worth asking is never “should we add AI?” but “which specific task, with which specific model, for which specific users, with what specific cost structure?”. The more specific the answer, the more likely it ships something that works.

Where Walsh London fits in

Most of the AI work we do is not building new AI products from scratch. It is helping teams figure out which AI features are worth building, integrating LLMs into existing products (web or mobile), and wiring up the evaluation and monitoring that most teams skip.

The usual engagement looks like one of three things:

AI feature strategy consult (1–2 weeks, discovery-shaped): which AI features are worth building, which are not, and in what order.
LLM integration build (scoped build or MVP-shaped): the actual implementation, with evaluation and guardrails included.
Fractional CTO with an AI focus: ongoing technical leadership for a team shipping AI features, including hiring, architecture, and vendor selection.

All prices shown are exclusive of VAT where applicable.

The short version

Add AI when:

Users are already doing the task manually at scale
You can write down the current cost per task
The task removes a real bottleneck in your funnel
You are replacing a cost your customer never sees

Do not add AI when:

The request came from a deck rather than a roadmap
You are copying a competitor without measuring their results
The proposed feature is a chat bolted onto something that does not need chatting
You want to do something cool (fine, put it in a side project)

And if you are staring at a vendor who says you need to “re-imagine your product with AI at the core”, ask them what that costs, what it ships, and what it moves. Then make your own decision.

The question you should be asking

Five good reasons to add AI

1. Your users are already doing the task manually, at scale

2. You have clear cost-per-task economics

3. Users are asking for it in specific, repeated ways

4. You have a real bottleneck in your core funnel

5. You are staring at an internal cost your customer will never see

Five bad reasons to add AI

1. Your investor asked about your AI strategy

2. A competitor added it

3. It sounds good in a pitch deck

4. You want to do something cool

5. Someone sold you on “AI-first”

Three common failure modes

The ChatGPT wrapper

The search that got worse

The summary nobody asked for

What AI integration actually costs

A practical decision framework

Where Walsh London fits in

The short version

Trying to work out if AI fits your product?