Trace logic, detect failures, and improve AI agent performance with real-time observability.
The only platform that learns how your agents actually behave.
| TRACE | AGENT | DURATION | TOKENS | COST | STATUS |
|---|---|---|---|---|---|
| trc_8f3a2b | support-agent | 1.24s | 2,847 | $0.023 | success |
| trc_7e2b3c | research-agent | 3.87s | 8,234 | $0.089 | warning |
| trc_6d1c4e | code-agent | 2.13s | 4,521 | $0.041 | success |
| trc_5c0b5f | support-agent | 0.89s | 1,203 | $0.012 | success |
| TRACE | AGENT | TYPE | FEEDBACK | TIME |
|---|---|---|---|---|
| trc_8f3a2b | support-agent | thumbs | Positive | 2m ago |
| trc_7e2b3c | research-agent | rating |
|
5m ago |
| trc_6d1c4e | code-agent | thumbs |
Negative
"Incorrect syntax"
|
8m ago |
| trc_5c0b5f | support-agent | rating |
|
12m ago |
support-agent exceeded 15% hallucination threshold
research-agent cost 340% above daily average
Response style shifted from professional to casual
code-agent avg response time >5s for last 10 requests
Use Cases
From support bots to research agents, trace every thought and decision. Know exactly what went wrong and why.
Trace every thought, every decision, every retrieval. See the complete chain of reasoning and find exactly where the failure occurred. No more guessing.
Watch your agent's memory in action. See which documents it retrieves, which passages it focuses on, and whether it's actually using the context you gave it.
Watch your agent think in real-time. Visualize decision trees, track tool calls, and understand exactly why it chose one path over another. Catch runaway loops before they drain your API budget.
Understand how your agent crafts each output. See the reasoning behind every word. Catch hallucinations, safety issues, and quality problems with deep inspection.
Foil in Action
A SaaS company deploys an AI support agent handling thousands of conversations daily. Foil monitors every conversation, tracking latency, tool usage, and quality scores in real time.
Key Insight
After a docs update, profile learning detected hallucinated answers. Alerting caught a 3x spike in "I don't know" responses within minutes.
A financial services firm processes documents through an AI pipeline — PDF parsing, classification, extraction, and compliance checks. Foil traces each stage end to end.
Key Insight
Anchored invariants enforce 97% accuracy and <30s processing. Change detector flagged 12% accuracy drift after a model update before any compliance violation.
A SaaS company deploys an agent that walks new users through product setup — answering questions, configuring settings, and escalating to humans when stuck. Foil monitors every onboarding session end-to-end.
Key Insight
Drift detection caught the agent routing 3× more users to human escalation after a docs update changed the setup flow — the agent's help guide was now outdated.
An engineering team uses AI to review PRs and triage CI failures. Foil monitors invocations, suggestion acceptance rates, and developer feedback.
Key Insight
A Sunday volume spike revealed a broken dependency causing cascading CI failures — burning API credits unnoticed until Foil flagged the anomaly.
Agent Learning
Foil continuously monitors your agents in production — quantitative metrics update in real-time while AI-generated behavioral profiles deepen with every learning cycle. The result: observability that's always accurate and always getting smarter.
The moment your agent sends its first trace, Foil starts tracking latency, error rates, tool usage, and volume — updating every 60 seconds. By the time your AI profile generates at 50 traces, you already have a full operational picture.
At 50 traces, Foil generates your agent's first behavioral profile — identity, tool patterns, error analysis, and insights — alongside health anchors: falsifiable claims like "error rate stays below 5%" that are continuously validated against live data.
The profile re-learns at geometric intervals (125, 313, 783+ traces), refining with each cycle. When 2 consecutive cycles find no material changes, the profile converges and transitions to steady-state.
Once converged, learning becomes change-driven. Foil monitors for behavioral drift — distribution shifts, rate changes, new tool or error types — and automatically re-learns when something meaningful changes. If enough anchors break, it re-enters rapid learning to self-correct.
Foil builds a behavioral profile for each agent that separates always-fresh quantitative metrics from AI-generated behavioral intelligence. Live metrics like latency, error rates, and tool distribution update continuously. AI insights deepen as your agent processes more traces.
Customer support agent for vehicle inquiries, test drives, and dealership operations
Every trace is evaluated against 9 built-in checks — hallucination, PII, prompt injection, and more. But unlike generic monitoring tools, Foil's evaluations use your agent's behavioral profile as context. A response that's normal for one agent might be anomalous for another. Foil knows the difference.
support-agent · "How do I track my order?"
research-agent · tool patterns, error baselines, behavioral norms
Platform Features
Purpose-built for AI agents. See what no other tool can show you.
See every step your agent takes: each LLM call, tool invocation, memory read, and branching decision. Understand its complete reasoning chain.
Catch the moment your agent makes things up. Flag responses that contradict context or fabricate sources, in real-time.
Detect policy violations, harmful outputs, and prompt injections before they reach users. See why they happened.
Track token usage and API spend down to individual decisions. Find exactly what's burning through your budget.
Link user feedback directly to agent decisions. Understand which reasoning paths lead to good or bad outcomes.
Replay any failed execution step-by-step. Rewind to the exact moment things went wrong and see every detail.
Natural language search across all your traces. Ask questions like "show me all conversations where the agent mentioned a refund" and get instant, semantically relevant results.
I can help you reset your password...
Your password has been reset successfully...
For security, we require email verification...
Track costs per request, model, and agent. Get budget alerts, monthly projections, and identify your most expensive workflows before they drain your budget.
Integration
Built on OpenTelemetry. Zero code changes to your LLM calls. Agent profiles build automatically.
npm install @foil/foil-js or pip install foil-ai. Built on OpenTelemetry for automatic instrumentation of all LLM calls.
One call at app startup. Foil auto-instruments your LLM calls via OpenTelemetry -- no code changes needed.
Every OpenAI, Anthropic, or other LLM call is automatically traced. Agent profiles build from real usage.
const { Foil } = require('@foil/foil-js/otel');
const OpenAI = require('openai');
// Initialize Foil (do this once at app startup)
Foil.init({
apiKey: process.env.FOIL_API_KEY,
agentName: 'my-first-agent',
});
// Use OpenAI as normal - it's automatically traced!
const openai = new OpenAI();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'What is the capital of France?' }],
});
console.log(response.choices[0].message.content);
// ↑ This call was automatically traced to Foil!
Pricing
Get started with a free trial. Upgrade as you grow.
14 days of Pro features
$0.001 / interaction
$0.005 / interaction
Real-time metrics from trace one. Behavioral intelligence that deepens over time. Evaluations that know your agent.