Karigor AI your-ai-sales-agent-is-burning-money-on-small-talk

Blog

March 31, 2026

1 min read

Your AI Sales Agent Is Burning Money on Small Talk

Most AI sales agents treat every message like a complex negotiation — even "what's the price?" That's expensive. We analyzed 580+ real conversations and found 70% are predictable small talk. Here's how we fixed it.

Md. Mehedi Hasan

Founder , Karigor AI Labs

If you're running an AI sales agent on Messenger, WhatsApp, or any chat channel, you've probably noticed the bill climbing faster than the revenue.

You're not alone. We run commerce AI agents for clients, and we watched the same thing happen: conversations tripled, costs tripled, but sales only grew 2x. Something was wrong with the economics.

So we did what engineers do. We pulled the data.

The 70% Problem#

We analyzed 580+ conversations from a live commerce agent handling skincare sales in Bangladesh. The agent runs on Claude Haiku 4.5 with extended thinking — a capable model that reasons through each response.

Here's what the first messages looked like:

33% were Facebook auto-generated icebreakers ("Can you check the price of a product?"). The customer didn't even type these — Facebook's "Send Message" button generates them automatically.
20% were one-word price asks ("দাম কত?" — "how much?", "pp", "price").
9% were location questions ("Where are you located?").
11% were greetings or single emojis.

70% of conversations started with a message the agent answered identically every time.

Yet every single one triggered a full LLM call. The model loaded an 8,000-token system prompt, generated reasoning tokens, considered the tone, and crafted a response — for a question it had already answered 33 times that day, word for word.

That's like hiring a consultant to answer the phone and say "our office hours are 9 to 5" — 170 times a day.

Where the Money Actually Goes#

If you're using prompt caching (and you should be), your input costs are already low. Cached system prompts cost $0.10/MTok instead of $1.00/MTok — a 90% reduction.

The real cost driver is thinking tokens.

Models with extended thinking or chain-of-thought reasoning generate internal reasoning before responding. These are billed as output tokens — the most expensive token type. For Claude Haiku 4.5, that's $5/MTok.

Our agent had a 2,048-token thinking budget. For a "what's the price?" question, it used maybe 300 tokens of actual reasoning. The rest was headroom — budget allocated but either wasted on unnecessary deliberation or simply unused ceiling.

Thinking tokens accounted for 78% of per-turn cost on cached conversations.

Most teams optimizing LLM costs focus on input: shorter prompts, better caching, fewer examples. That's necessary but not sufficient. The output side — especially thinking — is where the money quietly drains.

Two Fixes, One Afternoon#

We shipped two changes and cut daily AI costs by ~39%.

Fix 1: Don't Call the LLM for Predictable Messages#

For the 55% of first messages that match known patterns — auto-CTAs, price asks, location questions — we skip the LLM entirely. A simple pattern matcher checks the first message against a curated list. If it matches, a pre-built response goes out instantly.

Cost: $0.00. Latency: ~100ms instead of 2-3 seconds.

The pre-built responses aren't generic templates. We copied them from the LLM's own best responses — the ones it had already converged on after hundreds of identical conversations. The customer experience is indistinguishable.

The key safety constraint: this only applies to the first message. The moment the customer sends a follow-up, the full LLM takes over. The filter is a fast greeting, not a conversation replacement.

Fix 2: Scale Reasoning Budget by Conversation Depth#

Not every turn needs the same thinking power. "What's the price?" doesn't need 2,048 tokens of reasoning. But "I want the combo but the delivery charge seems high and my friend said the cream didn't work" — that needs real intelligence.

We made the thinking budget dynamic:

Turns 0-2: 1,024-token budget (price lists, greetings, simple product info)
Turn 3+: Full 2,048-token budget (objections, orders, multi-product comparisons)

The turn count was already available in our dispatch pipeline — no extra database queries needed. Just one if statement in the model factory.

The Results#

Metric	Before	After	Change
Daily AI cost (170 conv/day)	$10.20	$6.24	-39%
First-message response time (matched)	2-3s	~100ms	-95%
Conversion rate	4.9%	4.9%	No change
Customer satisfaction signals	Baseline	Baseline	No change

The agent is cheaper, faster on first response, and equally effective at selling. The quality metrics didn't move because we didn't change how the agent handles the conversations that matter — the 26% where real buying intent exists.

Why This Matters for Anyone Building Sales Agents#

If you're building a commerce agent — whether on Messenger, WhatsApp, or your own chat widget — your conversation distribution probably looks similar to ours:

A large chunk of first messages are platform-generated or one-word queries
Most conversations don't convert (and that's normal for chat commerce)
The LLM is your most expensive infrastructure component
Your cost scales linearly with every conversation, not with every sale

The fix isn't to use a cheaper model. It's to use the model only when it adds value.

Prompt caching handles the input side. Dynamic thinking handles the output side. Quick responses handle the cases where you don't need the model at all.

These three layers stack. Each one makes the economics more sustainable as you scale.

Or Just Let Us Build It#

Everything in this post is how we build commerce agents for clients at Karigor. Same optimizations, same care, your brand.

Book a call →

This post is based on real production data from a Karigor-managed commerce agent. For the full technical deep-dive with code and architecture diagrams, read 70% of Your AI Agent's Conversations Are Predictable. Act Like It.

ai-sales-agentcost-optimizationcommerce-aillm-costsfacebook-messengerai-efficiency