two years · 14 deployments · Claude · OpenAI · Gemini

Production AI engineering
on Claude, OpenAI, and Gemini.

anthropic

Claude

Long-context reasoning, tool use, and structured outputs. Where output quality has to clear an eval gate, this is usually the call.

openai

OpenAI

GPT models for breadth and routing, embeddings for retrieval, fine-tuning where a narrow task earns it. The default for the hot path on most engagements.

google

Gemini

Million-token context, multimodal inputs, and Vertex deployment. The right call when the workload is GCP-native or context lengths are extreme.

We pick the provider per engagement — sometimes per route — based on your data, latency budget, and compliance posture. No off-the-shelf templates; every system is purpose-built around the constraints that actually bind.

Featured case study

All case studies →

case study · B2B SaaS

Multi-tenant inference platform: 71% cost reduction at 4× the traffic

An internal inference platform for a SaaS company. Continuous batching, semantic caching, KV-cache reuse, model routing, per-tenant cost attribution. Built so the CFO can read the dashboard.

9.4M requests/day · 38 customer tenants · p99 < 2.1s · 99.97% uptime

Tell us about the problem.

Two paragraphs is enough. We come back within a business day with whether we’re a fit and what we’d look at first.

contact@zhironghuang.com +1 (561) 501-1280

Production AI engineering on Claude, OpenAI, and Gemini.

Model providers we ship on