Error Messages Are the New Prompts: How We Made Our AI Agent Stop Asking for Help
When your AI agent says "Please try again," it's broken. Here's the pattern that turns tool errors into autonomous fallback directives — with production code you can copy.

Error Messages Are the New Prompts | AI Agent Error Handling
Part 1 of Building Autonomous AI Agents - a series from Karigor AI Labs
TL;DR: In agentic systems, tool error messages are consumed by the LLM, not the user. If you write errors for humans ("Please try again"), your agent stops. If you write errors for the agent (named error type + context + NEXT_ACTIONS + prohibition), it recovers autonomously. Here's the pattern, the code, and a prompt you can copy.
Most AI agents fail silently. When a tool breaks, they apologize and stop, delegating the failure to the user. We fixed this with a pattern we call "error-as-directive": designing tool error messages as structured prompts that make the agent self-correct instead of giving up. Here's how we applied it to our Amazon product research agent, and how you can apply it to any AI agent you're building.
We built an AI agent that researches Amazon products - scrapes competitor listings, analyzes features, and generates optimized product pages. It worked great until Amazon blocked the scraper.
Here's what happened next:
User: Analyze this Amazon product: https://amazon.com/dp/B0EXAMPLE
Agent: I wasn't able to scrape that product page. The URL may be
invalid or the page may be temporarily unavailable.
Please try again or provide a different product URL.The agent had three other tools it could have used. It had enough context to keep going. Instead, it asked the user for help and stopped.
This wasn't a model problem. The agent was smart enough to recover. It just didn't know it was allowed to.
We traced it to two root causes. First, the system prompt had a rule that said "If scraping fails, tell the user." The agent was following instructions. Second, the error message was written for humans - "Please try again" - and the LLM read it as something to relay, not something to act on.
That's when we realized: in agentic systems, error messages aren't read by users. They're read by the model. Every tool return, success or failure, gets injected into the LLM's context window and shapes what it does next.
If you write your errors for humans, the agent behaves like a customer service bot: apologize, suggest the user try again, stop.
If you write your errors for the agent, it behaves like an autonomous system: adapt, fall back, deliver.
Error messages are the new prompts. Design them with the same care you design your system prompt.
How AI Agent Tool Errors Actually Work#
Most AI agents follow the ReAct pattern - reason, act, observe, repeat:
That "receives result" step is the key. Whether the tool succeeded or failed, the return value goes straight into the context window. The LLM doesn't distinguish between "tool output" and "instruction." It's all tokens.
This is what the industry does with tool failures today:
| Approach | What happens | Result |
|---|---|---|
| Retry (most frameworks) | Rerun the same tool | Same failure if root cause persists |
| Error as observation (LangChain, CrewAI) | Surface error text to the LLM | LLM sees the error but may not know what to do |
| Conditional routing (LangGraph) | Graph edges route to fallback nodes | Requires pre-built graph architecture |
| Error as directive (our pattern) | Error return contains explicit next steps | Agent follows structured fallback autonomously |
Our approach, error as directive, makes the agent self-correcting by design.
Retry doesn't help when Amazon is actively blocking you. Error-as-observation gives the model a signal but no plan. Conditional routing works but requires you to anticipate every failure path in a graph upfront.
We wanted something simpler. What if the AI agent tool error itself told the agent exactly what to do?
MIT research backs this up: LLMs self-correct reliably only when given structured external feedback. A vague error like "Search failed: Unknown error" gives the model almost nothing. But a structured directive like this:
SEARCH_FAILED: Exa API returned no results
Query: "B0EXAMPLE Amazon product review specifications"
NEXT_ACTIONS: Rephrase using broader terms. Try the product
category instead of specific ASIN.
DO NOT ask the user. Proceed with data from other tools....gives it a complete recovery plan.
We landed on four components for every agent-directive error:
- Error type: A named label (
SCRAPE_FAILED,SEARCH_EMPTY) - Context: What the agent needs for recovery (ASIN, query, category)
- Next actions: Explicit tool calls or strategies to try
- Prohibition: What NOT to do ("DO NOT ask the user")
The Pattern: Self-Correcting Error Messages (Before & After)#
This is the actual code transformation. Before and after, from our codebase.
Before: the error talks to the user#
// convex/agents/tools/scrapeTool.ts - BEFORE
if (!result.success) {
return `Failed to scrape product page: ${result.error || 'Unknown error'}
Please try again or verify the URL is correct.`;
}The LLM reads "Please try again" - language addressed to a human. It does what any polite assistant would: relays the message and stops. The agent had three other tools available. It called none of them.
After: the error talks to the agent#
// convex/agents/tools/scrapeTool.ts - AFTER
if (!result.success) {
const asin = extractAsin(args.url);
return `SCRAPE_FAILED: Could not extract product data from this URL.
Reason: ${result.error || 'Amazon anti-bot protection or page unavailable'}
ASIN: ${asin}
NEXT_ACTIONS (do these automatically, do NOT ask the user):
1. Call searchWeb with query: "${asin} Amazon product review specifications price features"
2. Call searchAmazon with the product category and relevant keywords
3. Use the search results to build the competitor analysis
4. Generate the full report using whatever data you gathered
You MUST produce the complete report. Note in Data Sources that direct scraping was unavailable.`;
}The LLM reads explicit instructions. It calls searchWeb, gathers alternative data, and produces the report. The user gets a result.
Same pattern, applied to search tools#
// BEFORE
return `No results found for: "${args.query}"
Please try again with a different query.`;// AFTER
return `SEARCH_EMPTY: No results found for: "${args.query}"
NEXT_ACTIONS: Rephrase using broader terms (e.g., general product
category instead of specific model).
Try related keywords or the product ASIN if available.
DO NOT ask the user. Proceed with data from other tools or your
category expertise.`;The difference is stark. The "before" talks to the user. The "after" talks to the agent. Same tool, same failure - completely different behavior.
Here's the anatomy broken down:
SCRAPE_FAILED: ← Named error type (semantic signal)
Could not extract... ← Human-readable reason (for logs)
ASIN: B0EXAMPLE ← Context the agent needs for fallback
NEXT_ACTIONS: ← Explicit recovery instructions
1. Call searchWeb... ← Specific tool + specific query
2. Call searchAmazon... ← Cascading fallback
3. Use the search... ← Graceful degradation
4. Generate the full... ← Always deliver something
DO NOT ask the user. ← Prohibition directiveBuilding a Fault-Tolerant AI Agent: Three Layers of Fallback#
Error messages are one layer. The full system has three, making this a production-grade, self-correcting AI agent architecture.
Layer 1: The scraper cascade#
Before the agent even sees an error, the scraping tool tries multiple approaches:
// Attempt 1: Crawl4AI (free, self-hosted)
const result = await ctx.runAction(internal.lib.crawl4ai.scrapeProduct, {
url: args.url,
});
if (result.success && result.title && result.title.length > 5) {
return formatScrapedResult(args.url, result);
}
// Attempt 2: Oxylabs E-Commerce API (paid fallback)
const oxylabsResult = await ctx.runAction(
internal.lib.oxylabs.scrapeAmazonProduct,
{ url: args.url }
);
if (oxylabsResult.success) {
return formatScrapedResult(args.url, oxylabsResult) +
'\n\n_Data source: Oxylabs E-Commerce API (structured data)_';
}
// Both failed - agent-directive error with NEXT_ACTIONSCrawl4AI is free and self-hosted. If Amazon blocks it, Oxylabs takes over with structured product data. Only if both fail does the agent see the SCRAPE_FAILED directive.
Layer 2: The search fallback#
When scraping fails entirely, the NEXT_ACTIONS directive sends the agent to search tools. searchWeb finds review sites, cached listings, and comparison articles via Exa's neural search. searchAmazon looks for competitor products in the same category.
Even search snippets contain usable data: product names, price ranges, feature mentions, review counts.
Layer 3: The system prompt contract#
The error directives are half the pattern. The other half is the system prompt telling the agent it's autonomous:
## CRITICAL: Autonomous Operation & Fallback Chain
You are a FULLY AUTONOMOUS agent. You must NEVER:
- Ask the user to try again later
- Ask the user to verify URLs or provide alternatives
- Present "options" for the user to choose from
- Tell the user a tool failed and stop
When a tool fails, follow this fallback chain AUTOMATICALLY:
1. Scrape fails? → Use searchWeb and searchAmazon
2. Search limited? → Use partial data + category expertise
3. All tools fail? → Generate report from training knowledge
4. ALWAYS produce the full reportThe system prompt says "never stop." The error messages say "here's exactly what to do next." Together, they create an agent that always delivers.
Data quality tiers#
The agent labels every report with its data provenance:
- Tier 1 - Live Data: All tools succeeded. Full live research.
- Tier 2 - Mixed Sources: Some tools failed. Report combines live search data with category expertise.
- Tier 3 - Expert Analysis: Tools unavailable. Report based on market knowledge. User should verify competitor data.
The user always gets a report. They also always know how much to trust it.
Why Self-Correcting Agents Work: What the Research Shows#
Three findings that validate this pattern:
LLMs need structured feedback to self-correct. MIT's critical survey on LLM self-correction found that models self-correct reliably only with external structured feedback, not vague signals. Our NEXT_ACTIONS directives are exactly that: structured, external, injected at the failure point.
Most agent failures are context problems. Carnegie Mellon research (cited by Composio's 2025 AI Agent Report) found that 70% of agent failures come from missing context, not model limitations. The agent had the capability to recover. It just didn't have the right context. Our error directives fix that.
Compound failure rates demand fallback chains. If each tool is 95% reliable, chaining 3 tools drops overall success to ~86%. Our tiered fallback creates what's increasingly called a self-healing agent - one that adapts to failures automatically and degrades gracefully instead of failing outright. Even at Tier 3, the user gets a report.
Prompt: Make Any AI Agent Self-Correcting (Copy & Paste)#
You don't need to refactor your agent's error handling manually. Give this prompt to Claude Code, Cursor, Copilot, or any coding agent. It'll audit your tool returns and refactor them into agent-directive format.
That's it. Your agent's error messages are now prompts.
From Prototype to Production: Building an Autonomous AI Agent for E-Commerce#
We didn't come up with this pattern in a sandbox. It runs in production on Karigor AI Labs - a platform where businesses deploy autonomous AI agents that do real work.
Our Research Agent hits Amazon's anti-bot walls regularly. It doesn't care. It cascades through scrapers, search engines, and its own expertise - and always delivers. Three agents on the platform share this same pattern. None of them have ever asked a user to "please try again."
This pattern didn't come from a design doc. It came from things breaking.
Early on, our agents failed and we'd find out from users. So we wrote a rule: agents must be autonomous. Then we wrote five more rules. We called it our constitution - six principles that every commit has to respect. That constitution lives in the repo, not in a Notion page nobody reads.
We kept losing the same lessons. Someone would write a tool error the old way - "please try again" - and the agent would start asking users for help. So we started encoding what we learned into Claude Code skills. We have 15 of them now. When you touch agent code, the right skill loads automatically and tells you how error returns should look. The pattern from this post isn't a convention we hope people follow. It's in the skill file.
We're building this into a platform for e-commerce teams. Amazon product research is the first AI agent for e-commerce. Walmart, Shopify, and multi-marketplace research are next. The pattern from this post - autonomous operation, structured fallback, transparent quality tiers - applies to every agent we ship. If you're building AI agents for e-commerce or evaluating AI product research tools for your team, this is the agentic architecture that makes them production-ready.
On the infrastructure side, we lean hard on Convex components - managed pieces for agent orchestration, rate limiting, streaming, cost tracking. We didn't build a queue system or a billing pipeline. We plugged in components and spent our time on the thing that actually matters: how the agent behaves when something goes wrong.
What's Next: Building Autonomous AI Agents (Series)#
This is Part 1 of Building Autonomous AI Agents - a series where we share what we've learned building agents that actually work in production.
Coming up:
Part 2: We Wrote a Constitution for Our Codebase. Six rules that every commit has to follow. How we went from "best practices" nobody reads to enforceable principles that prevent entire categories of bugs.
Part 3: 15 Skills That Teach the AI How to Work on Itself. We use Claude Code to build our agents. Every time we learned something the hard way, we turned it into a skill. Now the AI loads the right lesson before it writes a line of code.
Part 4: Why We Chose Convex Components Over Building Our Own. Rate limiting, streaming, cost tracking, RAG - we plugged in managed components instead of building infrastructure.
Part 5: Billing AI Agents Is Harder Than You Think. Anniversary billing cycles, reserve-mode rate limits that let agents finish before you charge them, and an immutable credit ledger. The boring stuff that makes a platform real.
Frequently Asked Questions#
What is a self-correcting AI agent?#
A self-correcting AI agent detects when its tools fail and automatically recovers without human intervention. Instead of asking the user to "try again," it reads structured error directives, switches to fallback tools, and delivers results using whatever data it can gather. The key design principle: error messages are prompts for the agent, not messages for the user.
How do you handle tool errors in AI agents?#
Replace human-facing error messages with agent-directive format: a named error type (SCRAPE_FAILED), relevant context (ASIN, query), explicit NEXT_ACTIONS with specific fallback tool calls, and a prohibition directive (DO NOT ask the user). The agent reads these as instructions and recovers autonomously.
What is the difference between retry, fallback, and self-correction?#
Retry reruns the same tool hoping for a different result. Fallback switches to an alternative tool or data source. Self-correction combines both: the agent reads a structured error directive, tries alternative tools in a specific order, uses partial data when available, and always produces output, labeling it with a data quality tier so users know what to trust.
Can AI agents replace manual product research?#
AI agents can automate 80-90% of product research tasks: competitor analysis, feature extraction, pricing research, and listing optimization. When primary data sources are blocked (e.g., Amazon anti-bot), a well-built agent falls back through alternative sources and still delivers actionable results. Human review is recommended for final business decisions.
What is agentic commerce?#
Agentic commerce uses AI agents to automate e-commerce workflows: product research, listing optimization, competitor monitoring, and pricing strategy. Unlike chatbots that answer questions, agentic commerce systems take autonomous action: they research, analyze, and produce deliverables without waiting for human input at each step.
Sources#
Academic & Research Papers#
- When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs - MIT Press, TACL (2024)
- CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing - ICLR 2024
- MAR: Multi-Agent Reflexion Improves Reasoning Abilities in LLMs - arXiv
- A Self-Correcting Multi-Agent LLM Framework - Nature, npj Artificial Intelligence (2025)
Industry Analysis#
- The 2025 AI Agent Report: Why AI Pilots Fail in Production - Composio
- Diagnosing and Measuring AI Agent Failures - Maxim AI
- 8 AI Agent Metrics That Go Beyond Accuracy - Galileo
- 12 Failure Patterns of Agentic AI Systems - Concentrix
- AI Agent Evaluation Replaces Data Labeling as the Critical Path - VentureBeat
Agent Design Patterns#
- The Unreasonable Effectiveness of an LLM Agent Loop with Tool Use - Sketch.dev
- Designing Agentic Loops - Simon Willison
- Better Ways to Build Self-Improving AI Agents - Yohei Nakajima
- Errors + Graceful Failure - Google PAIR Guidebook
- ReAct Prompting - Prompt Engineering Guide
Framework Documentation#
- Advanced Error Handling in LangGraph - SparkCo
- Self-Correcting Chain: Managing Tool Failures in LangChain - Kamal Dhungana
- CrewAI Tasks Documentation - CrewAI
- OpenAI Agents SDK: Tools - OpenAI
- AutoGen Error Handling Gap - GitHub Issue #5272
