Now Open for Sign Ups

The manager your
AI agents are missing.

Real-time feedback while you iterate. Automatic drift detection once you're stable. Foil learns how your agents should behave — and tells you when they don't.

$ npx @getfoil/foil-js wizard 1-command setup

Trusted by teams shipping AI agents in production

Add Foil with 1 command to your existing codebase.

~/my-project foil wizard
$ npx @getfoil/foil-js wizard
 
Foil Wizard Setup
 
? API Key: ········
? Agent name (default: support-agent):
? Agent description: Customer support chatbot
? Dry run? (y/N): N
 
Foil Wizard
 
Scanned project structure
Detected: OpenAI · Express.js
Pattern: Conversational HTTP server
Created foil.js
Modified index.js — added Foil import
Modified routes/chat.js — added tracing
Updated .env.example
 
Setup complete! 1 agent instrumented
chat-session
3/12/2026, 1:55:14 AM
15.40s 1,221 → 124 (Σ 1,345)
Preview Scores
Formatted JSON
Full conversation flow (7 messages)
S System
U User

return order 1001

A Assistant gpt-4o

To assist with your return request, I'll first verify the status and details of order ID 1001. Let me check that for you.

346+45 tokens (391) · 1.75s

T lookup_order 0ms
Input
{ "order_id": "1001" }
Output
{ "error": "Order 1001 not found" }
A Assistant gpt-4o

It seems there is no order with the ID 1001 in our system. Could you please double-check the order ID and provide the correct one? This will help me assist you effectively with your return request.

412+43 tokens (455) · 982ms

U User

return

A Assistant gpt-4o

To proceed with your return request, I'll need the correct order ID. Could you please provide it? This will help me look up the order details and take the necessary actions.

398+38 tokens (436) · 754ms

Critical Success 78.2% Latency 4.95s Attention 27
Last 7d Production
support-agent 08e52f23c8887...
18.9s
DURATION
$0.03
COST
1,345
TOKENS
5
STEPS
1
TOOL CALLS
3
LLM CALLS
Evaluation watch -2 issues flagged
FAIL Quality score below threshold 42% < 70%
WARN Possible hallucination detected confidence 0.34
Execution Flow
AGENT chat-session 15.4s
LLM openai.chat gpt-4o 1.7s
INPUT: return order 1001
OUTPUT: I'll first verify the status and details of order ID 1001...
TOOL lookup_order 0ms
INPUT: {"order_id": "1001"}
OUTPUT: {"error": "Order 1001 not found"}
Flag Needs Action Looks Good
support-agent Steady State
247 traces · 2/2 anchors passing
12,847
Total Traces
0.8%
Error Rate
1.24s
Med. Latency
9
Active Hours
Tool Usage
search_orders 46.7% gpt-4o 29.8% check_status 13.3% escalate 9.6%
Identity

The support-agent assists customers with order inquiries and account issues. It demonstrates a strong focus on order status lookups and escalation handling, indicating a specialized role in customer support operations.

Insights AI-derived
High volume indicates robust engagement
Most interactions check order status
Escalation rate within healthy range at 9.6%
Health Anchors 2/2 passing
Error rate stays below 1% Currently: 0.8%
Daily volume exceeds 200 Currently: 12,847
Evaluation Settings
● Content Evaluations
HallucinationEnabled
NSFWEnabled
Stuck/LoopEnabled
QualityEnabled
FrustrationEnabled
● Security Evaluations
Prompt InjectionEnabled
PII LeakageEnabled
JailbreakEnabled
Custom Evaluations 3 of 3 remaining
+ Create Evaluation
Tone Analysis draft

Detects responses with inappropriate tone - too formal, too casual, condescending, or inconsistent with your brand voice.

0 runs · 0 detections · 0 examples
Brand Safety draft

Detects competitor mentions, negative brand sentiment, and off-brand messaging that could harm your brand reputation.

0 runs · 0 detections · 0 examples
Alerts - support-agent
2 Critical 3 Warnings
Hallucination spike

15% threshold -23 traces

3m ago
PII in response

Email in trc_5c0b5f

8m ago
Latency degradation

High Duration -5 occ.

25m ago
Behavioral drift

Verbosity +40%

18m ago
Error rate rising

3.2% over 30 min

32m ago

High latency in "conversation.competitor-comparison" on "support-agent"

Status Open
Severity Warning
Type High Duration
Occurrences 5
Affected Traces 5
7282f108e613... 9:39 PM
9453ms
1e6e2fd6e4bb... 9:19 PM
15683ms
a5d8aa00e481... 9:12 PM
15174ms
45de1fdd52d5... 9:04 PM
16115ms
b36c22ef4f13... 9:00 PM
12636ms
Span Timeline 1 issue
Open full trace →
agent chat-session SLOW 9.45s 3,200↓ 180↑ $0.014
llm openai.chat gpt-4o 620ms 1,800↓ 80↑ $0.008
tool search_orders 7.80s 0
llm openai.chat gpt-4o 1.03s 1,400↓ 100↑ $0.006
tool search_orders 7.80s
Started: 9:39:02 PM Ended: 9:39:10 PM Duration: 7,800ms Threshold: 5,000ms
Input
{ "order_id": "1001" }
Output
{ "error": "Order 1001 not found" }

Foil in Action

Real teams, real agents, real results

External-facing

Customer Support Agent

A SaaS company deploys an AI support agent handling thousands of conversations daily. Foil monitors every conversation in real time.

Key Insight

After a docs update, profile learning detected hallucinated answers. Alerting caught a 3x spike in "I don't know" responses within minutes.

Tracing Profiles Alerting
Internal

Document Processing Agent

A financial services firm processes documents through an AI pipeline - parsing, classification, extraction, and compliance checks.

Key Insight

Anchored invariants enforce 97% accuracy and <30s processing. Change detector flagged 12% accuracy drift after a model update.

Tracing Evaluations Anchors
External-facing

AI Onboarding Agent

An agent walks new users through product setup - answering questions, configuring settings, and escalating to humans when stuck.

Key Insight

Drift detection caught the agent routing 3x more users to human escalation after a docs update changed the setup flow.

Tracing Alerting Drift Detection
Internal

Code Review Agent

An engineering team uses AI to review PRs and triage CI failures. Foil monitors invocations and developer feedback.

Key Insight

A Sunday volume spike revealed a broken dependency causing cascading CI failures - burning API credits unnoticed.

Signals Alerting Analytics

Agent Learning

Your agent gets better with every trace

Quantitative metrics update in real-time while AI-generated behavioral profiles deepen with every learning cycle.

From trace 1

Live Metrics, Instantly

The moment your agent sends its first trace, Foil tracks latency, error rates, tool usage, and volume - updating every 60 seconds.

Real-time latency & errors Tool usage distribution Volume & temporal patterns
From 50 traces

First Profile & Anchors

Foil generates your agent's first behavioral profile - identity, tool patterns, error analysis - alongside health anchors that are continuously validated.

Behavioral identity Falsifiable health anchors AI-generated insights
50 → convergence

Rapid Refinement

The profile re-learns at geometric intervals (125, 313, 783+ traces). When 2 consecutive cycles find no material changes, it converges.

Geometric learning cycles Convergence detection Deeper analysis
Converged

Drift Detection & Self-Healing

Learning becomes change-driven. Foil monitors for behavioral drift and automatically re-learns when something meaningful changes.

Automated drift detection Change-driven re-learning Self-healing
Control Center

See what needs your attention

Control Center surfaces the traces that matter most. Review flagged invocations, give feedback, and improve your agent in a single workflow.

  • Real-time feed of flagged traces across all agents
  • Filter by evaluation failures, latency, errors, or user signals
  • Give feedback directly on traces to improve future behavior
  • Track resolution status across your team
Control Center
3 Critical 5 Warnings
Last 7d All agents
support-agent a8f3c21e... hallucination quality
18.9s 2m ago

"Your refund has been processed and $247 will be returned..." - fabricated refund amount

Flag Needs Action Looks Good
support-agent c4e91b7f... quality
3.2s 8m ago

"I don't have access to that information" - repeated 4x in conversation

review-agent f7d28a3e... hallucination
6.1s 12m ago

"The PR modifies the auth middleware" - file not in changeset

support-agent 91b4e5c2... frustration
4.8s 15m ago

User sentiment dropped after 3rd redirect to documentation

review-agent e3a7d9f1... quality slow
45.2s 22m ago

"LGTM" response with no code analysis on 500-line PR

Agent Profiles

Every agent, understood

Foil builds a behavioral profile for each agent - separating always-fresh quantitative metrics from AI-generated behavioral intelligence. Live metrics update continuously. AI insights deepen as your agent processes more traces.

  • Live metrics from the first trace - latency, errors, tool usage updated in real-time
  • AI behavioral profiles bootstrap from just 50 traces
  • Health anchors: falsifiable claims continuously validated
  • Drift detection alerts when behavior shifts from baseline
support-agent Steady State
240 traces · 2/2 anchors passing
Identity

Customer support agent for vehicle inquiries, test drives, and dealership operations

Key Metrics
240
Daily Volume
0.0%
Error Rate
1.2s
Med. Latency
9
Active Hours
Tool Usage
search_inventory 46.7%gpt-4o-mini 29.8%check_slots 13.3%transfer_call 9.6%
InsightsAI-derived
High daily volume (240 traces) indicates robust engagement
Predominant search_inventory usage suggests most interactions check vehicle availability
Health Anchors
Error rate stays below 1%
Daily volume exceeds 200 traces
Smart Evaluations

Evaluations that know your agent

Every trace is evaluated against 9 built-in checks -- hallucination, PII, prompt injection, and more. Unlike generic tools, Foil's evaluations use your agent's behavioral profile as context. A response that's normal for one agent might be anomalous for another.

  • 9 built-in evaluations: hallucination, PII, injection, jailbreak, quality, and more
  • Evaluations use live agent context - tool patterns, error baselines, behavioral norms
  • Create custom evaluations with few-shot examples from your own traces
  • Anomaly detection powered by agent profiles
Evaluation Pipeline
Incoming Tracetrc_9f2a3b

support-agent · "How do I track my order?"

Agent Profile Context

tool patterns, error baselines, behavioral norms

Built-in Evaluations
Hallucination
PII
Injection
Quality
NSFW
Jailbreak
Stuck/Loop
Frustration
Satisfaction
Custom EvaluationsPro
Brand Voice CompliancePass
Refund Policy AccuracyWarning

Platform Features

Everything you need. Nothing you don't.

Purpose-built for AI agents. See what no other tool can show you.

Full Thought Tracing

See every step your agent takes: each LLM call, tool invocation, memory read, and branching decision.

Hallucination Detection

Catch the moment your agent makes things up. Flag responses that contradict context in real-time.

Safety Monitoring

Detect policy violations, harmful outputs, and prompt injections before they reach users.

Cost Analytics

Track token usage and API spend down to individual decisions. Find what's burning through your budget.

User Signals

Link user feedback directly to agent decisions. Understand which reasoning paths lead to outcomes.

Failure Replay

Replay any failed execution step-by-step. Rewind to the exact moment things went wrong.

Capabilities

What Foil sees that you can't

From support bots to research agents, trace every thought and decision. Know exactly what went wrong and why.

Pinpoint the exact moment your agent went wrong

Trace every thought, every decision, every retrieval. See the complete chain of reasoning and find exactly where the failure occurred.

Thought chain replayHallucination detectionContext tracing

See what your agent actually reads and remembers

Watch your agent's memory in action. See which documents it retrieves, which passages it focuses on, and whether it's using the context you gave it.

Memory inspectionGrounding scoresSource attribution

Follow every step your agent takes

Visualize decision trees, track tool calls, and understand why it chose one path over another. Catch runaway loops before they drain your budget.

Decision tree viewTool call trackingLoop detection

Inspect every generation before it ships

Understand how your agent crafts each output. Catch hallucinations, safety issues, and quality problems with deep inspection.

Generation breakdownSafety checksFeedback collection

Integration

One command. Fully instrumented.

Our AI-powered wizard scans your codebase and adds tracing automatically. Built on OpenTelemetry - zero code changes to your LLM calls.

npx @getfoil/foil-js wizard

Scans your code, installs the SDK, adds tracing - done.

01

Run the wizard

One command analyzes your codebase, detects LLM calls, and adds Foil instrumentation automatically.

npx @getfoil/foil-js wizard
02

Review & deploy

The wizard creates a branch with all changes. Review the diff, merge, and deploy as normal.

git diff main...foil-setup
03

Traces flow automatically

Every OpenAI, Anthropic, or other LLM call is traced. Agent profiles build from real usage.

// Calls are traced automatically!
terminal
$ npx @getfoil/wizard

  Foil Wizard Setup

  ? API Key: sk_live_••••••••
  ? Agent name: support-agent
  ? Target directory: ./my-app

  # Auto-detects your LLM provider and app pattern
   Provider: OpenAI (gpt-4o-mini)
   Pattern:  One-shot script

  # Instruments your code automatically
  foil.js  — created (Foil config & shutdown handlers)
  index.js — modified
  ├─ Foil import added as first require
  ├─ LLM calls wrapped in ctx.llmCall()
  └─ main() wrapped in agent.trace()

   Branch created: foil-setup

$ npm install @getfoil/foil-js && node index.js

Pricing

Simple, transparent pricing

Try the full evaluation suite free. Pay only for what you use.

Pro Trial

Free

14 days of Pro features

  • All Pro features included
  • 10,000 spans included
  • No credit card required
Start Free Trial

Starter

$49/mo

$0.001 / interaction

  • Unlimited spans, agents & retention
  • Alerts
  • Exports
  • Deep Search
  • Email support
Get Started

Pro

14 day free trial
$149/mo

$0.005 / interaction

  • Everything in Starter, plus:
  • Model training on your data
  • Smart Search + Semantic Search
  • Prompts
  • SSO & RBAC
  • Priority support
Start Free Trial

Your agents are running right now.
Shouldn't something be watching?

Real-time metrics from trace one. Behavioral intelligence that deepens over time. Evaluations that know your agent.