Customer-facing support copilot, deflecting 38% of L1 tickets

A grounded copilot embedded in a SaaS product's help center — answers from real docs, escalates honestly when it doesn't know, and stays on-brand.

The problem

The client had been running a homegrown chatbot for two years. Tickets weren't going down. CSAT went down in cohorts that interacted with it. Their team was stuck between "ship a real LLM" and "stop hallucinating about our pricing."

The ask was narrower than it looked: deflect L1 tickets (password resets, billing FAQ, integration setup) without ever inventing policy.

What we built

A retrieval-grounded copilot with a hard constraint: no answer without a cited source from approved docs.

The architecture leans heavily on the model for understanding, and not at all for knowledge:

Query understanding — classify intent, decide whether to retrieve, decide whether to escalate. Small, fast, and tunable.
Retrieval — semantic + keyword hybrid over the help-center corpus, with a freshness prior so newly published docs are surfaced fast.
Constrained generation — the model is given retrieved passages and instructed to answer only from them, with citation IDs inline. A post-processor strips any paragraph that lacks a verifiable citation.
Honest fallback — when no passage clears a confidence threshold, the bot says so plainly and offers to open a ticket with context pre-filled.

Safety, not "guardrails"

We refused to ship "guardrails" as a layer of regex. Safety was enforced structurally:

The model never sees pricing or billing data. Those answers come from a deterministic FAQ retrieval path with hardcoded responses.
Account-specific questions (entitlements, current plan) are handled by a tool call to the client's API. The LLM doesn't answer them; it routes.
Every response is logged with its retrieval set for auditability. We can replay any conversation.

Eval discipline

Before launch we built an eval suite of 1,800 real prior tickets, hand-labeled with ideal outcomes (deflect, escalate-with-context, escalate-cold). Every prompt change, every retriever tweak, every model swap was scored against it. Releases were gated on no regression in escalation precision.

Outcome

38% L1 deflection in the first three months, sustained over nine.
+0.4 CSAT in cohorts that interacted with the copilot vs. the prior chatbot baseline.
Zero compliance incidents — no invented policy, no leaked data, no fabricated pricing.
p95 1.5s end-to-end including retrieval and generation.

The copilot is now extended by the client's product team. We rotated off after handoff and a 30-day stabilization window.