Skip to content

/ case-studies · Jul 15, 2025

Customer-facing support copilot, deflecting 38% of L1 tickets

A grounded copilot embedded in a SaaS product's help center — answers from real docs, escalates honestly when it doesn't know, and stays on-brand.

Client
B2B SaaS, ~50k seats
Industry
SaaS
Scale
1.2M monthly conversations · p95 < 1.5s · 99.95% uptime
Stack
Anthropic Claude · Pinecone · Cloudflare Workers · Postgres · Datadog
Outcome
38% L1 ticket deflection at +0.4 CSAT vs. baseline; zero compliance incidents in 9 months.

The problem

The client had been running a homegrown chatbot for two years. Tickets weren't going down. CSAT went down in cohorts that interacted with it. Their team was stuck between "ship a real LLM" and "stop hallucinating about our pricing."

The ask was narrower than it looked: deflect L1 tickets (password resets, billing FAQ, integration setup) without ever inventing policy.

What we built

A retrieval-grounded copilot with a hard constraint: no answer without a cited source from approved docs.

The architecture leans heavily on the model for understanding, and not at all for knowledge:

  • Query understanding — classify intent, decide whether to retrieve, decide whether to escalate. Small, fast, and tunable.
  • Retrieval — semantic + keyword hybrid over the help-center corpus, with a freshness prior so newly published docs are surfaced fast.
  • Constrained generation — the model is given retrieved passages and instructed to answer only from them, with citation IDs inline. A post-processor strips any paragraph that lacks a verifiable citation.
  • Honest fallback — when no passage clears a confidence threshold, the bot says so plainly and offers to open a ticket with context pre-filled.

Safety, not "guardrails"

We refused to ship "guardrails" as a layer of regex. Safety was enforced structurally:

  • The model never sees pricing or billing data. Those answers come from a deterministic FAQ retrieval path with hardcoded responses.
  • Account-specific questions (entitlements, current plan) are handled by a tool call to the client's API. The LLM doesn't answer them; it routes.
  • Every response is logged with its retrieval set for auditability. We can replay any conversation.

Eval discipline

Before launch we built an eval suite of 1,800 real prior tickets, hand-labeled with ideal outcomes (deflect, escalate-with-context, escalate-cold). Every prompt change, every retriever tweak, every model swap was scored against it. Releases were gated on no regression in escalation precision.

Outcome

  • 38% L1 deflection in the first three months, sustained over nine.
  • +0.4 CSAT in cohorts that interacted with the copilot vs. the prior chatbot baseline.
  • Zero compliance incidents — no invented policy, no leaked data, no fabricated pricing.
  • p95 1.5s end-to-end including retrieval and generation.

The copilot is now extended by the client's product team. We rotated off after handoff and a 30-day stabilization window.

← All case studies