Now available

Understand Every Decision Your AI Agent Makes Instantly

Trace logic, detect failures, and improve AI agent performance with real-time observability. The only platform that learns how your agents actually behave.

Dashboard
Traces
12,847 +12%
Latency
1.24s -8%
Errors
0.8% -23%
Cost
$847 +5%
TRACE AGENT COST STATUS
trc_8f3a2b support-agent $0.023 success
trc_7e2b3c research-agent $0.089 warning
trc_6d1c4e code-agent $0.041 success
trc_5c0b5f support-agent $0.012 success
End-to-end trace timeline See every step your agent takes
Live agent profiles Metrics that update in real-time, intelligence that deepens over time
Latency & cost breakdown Optimize performance per step

Use Cases

Understand why your agent does what it does

From support bots to research agents, trace every thought and decision. Know exactly what went wrong and why.

Customer Support

Pinpoint the exact moment your agent went wrong

Trace every thought, every decision, every retrieval. See the complete chain of reasoning and find exactly where the failure occurred. No more guessing.

  • Thought chain replay
  • Hallucination detection
  • Context tracing
Conversation Replay
U
How do I reset my password?
A
You can reset your password by clicking "Forgot Password" on the login page...
Grounded 847 tokens · 0.89s
RAG Applications

See what your agent actually reads and remembers

Watch your agent's memory in action. See which documents it retrieves, which passages it focuses on, and whether it's actually using the context you gave it.

  • Memory inspection
  • Grounding scores
  • Source attribution
Retrieval Analysis
Retrieved Documents
docs/password-reset.md0.94
docs/security-faq.md0.87
docs/account-setup.md0.62
Response fully grounded in sources
Autonomous Agents

Follow every step your agent takes

Watch your agent think in real-time. Visualize decision trees, track tool calls, and understand exactly why it chose one path over another. Catch runaway loops before they drain your API budget.

  • Decision tree view
  • Tool call tracking
  • Loop detection
Span Timeline
llm llm:openai-agent
6.18s
llm gpt-4o-mini
3.06s237 tok
tool get_current_weather
105ms
tool get_current_weather
1ms
tool calculate
1ms
llm gpt-4o-mini
3.08s435 tok
Content & Code Generation

Inspect every generation before it ships

Understand how your agent crafts each output. See the reasoning behind every word. Catch hallucinations, safety issues, and quality problems with deep inspection.

  • Generation breakdown
  • Safety checks
  • Feedback collection
Quality Signals
User Feedback
4.2
Safety Score
95%
Safe On-brand Factual

Ready to understand your AI agents?

Sign Up Talk to a Founder

Foil in Action

Real teams, real agents, real results

External-facing

Customer Support AI

A SaaS company deploys an AI support agent handling thousands of conversations daily. Foil monitors every conversation, tracking latency, tool usage, and quality scores in real time.

Key Insight

After a docs update, profile learning detected hallucinated answers. Alerting caught a 3x spike in "I don't know" responses within minutes.

Tracing Profiles Alerting
Internal

Document Processing & Compliance

A financial services firm processes documents through an AI pipeline — PDF parsing, classification, extraction, and compliance checks. Foil traces each stage end to end.

Key Insight

Anchored invariants enforce 97% accuracy and <30s processing. Change detector flagged 12% accuracy drift after a model update before any compliance violation.

Tracing Evaluations Anchors
External-facing

AI Onboarding Agent

A SaaS company deploys an agent that walks new users through product setup — answering questions, configuring settings, and escalating to humans when stuck. Foil monitors every onboarding session end-to-end.

Key Insight

Drift detection caught the agent routing 3× more users to human escalation after a docs update changed the setup flow — the agent's help guide was now outdated.

Tracing Alerting Drift Detection
Internal

Code Review & CI Triage

An engineering team uses AI to review PRs and triage CI failures. Foil monitors invocations, suggestion acceptance rates, and developer feedback.

Key Insight

A Sunday volume spike revealed a broken dependency causing cascading CI failures — burning API credits unnoticed until Foil flagged the anomaly.

Signals Alerting Analytics

Agent Learning

Your agents reveal themselves over time

Foil continuously monitors your agents in production — quantitative metrics update in real-time while AI-generated behavioral profiles deepen with every learning cycle. The result: observability that's always accurate and always getting smarter.

From trace 1

Live Metrics, Instantly

The moment your agent sends its first trace, Foil starts tracking latency, error rates, tool usage, and volume — updating every 60 seconds. By the time your AI profile generates at 50 traces, you already have a full operational picture.

Real-time latency & errors Tool usage distribution Volume & temporal patterns
From 50 traces

First Profile & Anchors

At 50 traces, Foil generates your agent's first behavioral profile — identity, tool patterns, error analysis, and insights — alongside health anchors: falsifiable claims like "error rate stays below 5%" that are continuously validated against live data.

Behavioral identity & analysis Falsifiable health anchors AI-generated insights
50 → convergence

Rapid Refinement

The profile re-learns at geometric intervals (125, 313, 783+ traces), refining with each cycle. When 2 consecutive cycles find no material changes, the profile converges and transitions to steady-state.

Geometric learning cycles Convergence detection Progressively deeper analysis
Converged

Drift Detection & Self-Healing

Once converged, learning becomes change-driven. Foil monitors for behavioral drift — distribution shifts, rate changes, new tool or error types — and automatically re-learns when something meaningful changes. If enough anchors break, it re-enters rapid learning to self-correct.

Automated drift detection Change-driven re-learning Anchor-break self-healing
Agent Profiles

Every agent gets a living profile

Foil builds a behavioral profile for each agent that separates always-fresh quantitative metrics from AI-generated behavioral intelligence. Live metrics like latency, error rates, and tool distribution update continuously. AI insights deepen as your agent processes more traces.

  • Live metrics from the first trace — latency, errors, tool usage updated in real-time
  • AI behavioral profiles bootstrap from just 50 traces
  • Health anchors: falsifiable claims that are continuously validated
  • Drift detection alerts when behavior shifts from baseline
  • Lock specific profile fields to prevent auto-updates
support-agent Steady State
240 traces · 2/2 anchors passing
Auto Learning
Identity

Customer support agent for vehicle inquiries, test drives, and dealership operations

high confidence established
Behavioral Overview
240
Daily Volume
0.0%
Error Rate
1.2s
Median Latency
9
Active Hours
Tool Usage
search_inventory 46.7% gpt-4o-mini 29.8% check_slots 13.3% transfer_call 9.6% book_appt 6.3%
Insights AI-derived
High daily volume (240 traces) indicates robust customer engagement
Predominant use of search_inventory (46.7%) suggests most interactions check vehicle availability
Health Anchors 2/2 passing
Error rate stays below 1%
Daily volume exceeds 200 traces
Confidence: low
Smart Evaluations

Profile-powered evaluations that learn your agent

Every trace is evaluated against 9 built-in checks — hallucination, PII, prompt injection, and more. But unlike generic monitoring tools, Foil's evaluations use your agent's behavioral profile as context. A response that's normal for one agent might be anomalous for another. Foil knows the difference.

  • 9 built-in evaluations: hallucination, PII, injection, jailbreak, quality, frustration, satisfaction, stuck detection, NSFW
  • Evaluations use live agent context — tool patterns, error baselines, behavioral norms
  • Create custom evaluations with few-shot examples from your own traces
  • Anomaly detection powered by agent profiles — catch deviations generic tools miss
Evaluation Pipeline
Incoming Trace trc_9f2a3b

support-agent · "How do I track my order?"

Agent Profile Context

research-agent · tool patterns, error baselines, behavioral norms

Built-in Evaluations
Hallucination
PII
Injection
Quality
NSFW
Jailbreak
Stuck/Loop
Frustration
Satisfaction
Custom Evaluations Pro
Brand Voice CompliancePass
Refund Policy AccuracyWarning

Platform Features

The deepest visibility into agent behavior

Purpose-built for AI agents. See what no other tool can show you.

Full Thought Tracing

See every step your agent takes: each LLM call, tool invocation, memory read, and branching decision. Understand its complete reasoning chain.

Hallucination Detection

Catch the moment your agent makes things up. Flag responses that contradict context or fabricate sources, in real-time.

Safety Monitoring

Detect policy violations, harmful outputs, and prompt injections before they reach users. See why they happened.

Cost Analytics

Track token usage and API spend down to individual decisions. Find exactly what's burning through your budget.

User Signals

Link user feedback directly to agent decisions. Understand which reasoning paths lead to good or bad outcomes.

Failure Replay

Replay any failed execution step-by-step. Rewind to the exact moment things went wrong and see every detail.

Deep Search

Find any conversation instantly

Natural language search across all your traces. Ask questions like "show me all conversations where the agent mentioned a refund" and get instant, semantically relevant results.

conversations about password resets
AI
trc_abc12h ago

I can help you reset your password...

98%
trc_def25h ago

Your password has been reset successfully...

94%
trc_ghi31d ago

For security, we require email verification...

89%
847 results found
Cost Intelligence

Know exactly where your budget goes

Track costs per request, model, and agent. Get budget alerts, monthly projections, and identify your most expensive workflows before they drain your budget.

$1,247 This month
gpt-4o$892
gpt-4o-mini$234
claude-3.5$121
Projected monthly$2,494
Budget remaining$753

Integration

Up and running in minutes

Built on OpenTelemetry. Zero code changes to your LLM calls. Agent profiles build automatically.

01

Install the SDK

npm install @foil/foil-js or pip install foil-ai. Built on OpenTelemetry for automatic instrumentation of all LLM calls.

npm install @foil/foil-js
02

Initialize Foil

One call at app startup. Foil auto-instruments your LLM calls via OpenTelemetry -- no code changes needed.

Foil.init({ apiKey, agentName })
03

Use your LLM as normal

Every OpenAI, Anthropic, or other LLM call is automatically traced. Agent profiles build from real usage.

// Calls are traced automatically!
agent.js
const { Foil } = require('@foil/foil-js/otel');
const OpenAI = require('openai');

// Initialize Foil (do this once at app startup)
Foil.init({
  apiKey: process.env.FOIL_API_KEY,
  agentName: 'my-first-agent',
});

// Use OpenAI as normal - it's automatically traced!
const openai = new OpenAI();

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is the capital of France?' }],
});

console.log(response.choices[0].message.content);
// ↑ This call was automatically traced to Foil!

Pricing

Simple, transparent pricing

Get started with a free trial. Upgrade as you grow.

Pro Trial

Free

14 days of Pro features

  • All Pro features included
  • 10,000 spans included
  • No credit card required
Start Free Trial

Starter

$49/mo

$0.001 / interaction

  • Unlimited spans, agents & retention
  • Alerts
  • Exports
  • Deep Search
  • Email support
Get Started

Pro

14 day free trial
$149/mo

$0.005 / interaction

  • Everything in Starter, plus:
  • Model training on your data
  • Deep Search + Semantic Search + Smart Search
  • Prompts
  • SSO & RBAC
  • Priority support
Start Free Trial

Stop guessing. Start understanding.

Real-time metrics from trace one. Behavioral intelligence that deepens over time. Evaluations that know your agent.