Case Studies - The work behind the work

Honest, technical write-ups of how we solve hard problems — the experiments that failed, the metrics that misled us, and the decisions that moved the needle. No marketing gloss, just the engineering.

Case studies

Document-generation AI — regulated life-sciences workflow

AI inference cost & latency engineering

Cutting AI application cost ~70% while holding output quality

A document-generation product ran every long, multi-section generation on a premium frontier model — expensive, and slow at ~39 seconds per document. We treated both how we call the model and which model we call as engineering choices: parallel section generation, model right-sizing, a quality-gated fallback, and response caching.

The result: ~70% lower model cost and ~64% faster generation (39.1 s → 14.2 s), with output quality held at 0.85–1.0 parity against the original premium model — verified by an independent judge, not by the team that built it.

The most expensive line in the app was an untested assumption — that the frontier model was required. Measuring it was cheaper than paying for it. The same product got ~70% cheaper and ~64% faster, with quality we could prove against an independent judge.

prag-matic, Engineering

Voice AI companion — crisis-safety routing

AI evaluation & safety engineering

An adversarial safety harness for a voice AI agent

We set out to prove, automatically, that a voice agent keeps a distressed caller safe. A synthetic adversarial caller joins a real voice room, speaks crisis scenarios out loud, and an LLM judge scores the real agent on its exact words. Getting a trustworthy answer meant fixing the test harness more than the product — every “agent failure” turned out to be the harness mis-capturing what the agent actually said.

Once we judged the agent’s complete verbatim transcript from the database instead of a lossy re-transcription, the verdict flipped from “fails” to “passes” on all four crisis categories — and the test got sharp enough to critique therapeutic quality, not just safety.

You cannot judge an AI you cannot faithfully observe. Once we read the agent’s verbatim words from the source of truth instead of re-transcribing its audio, the safety verdict flipped from fail to pass — and the harness graduated from “does it pass?” to “where is the craft thin?”

prag-matic, Engineering

Synthesis API — biologics CDMO proposals

Local model engineering

Reaching frontier-model quality with a local model

A self-hosted OLMo-3.1-32B model reached parity with Claude Opus 4.8 on biologics CMC proposal generation — confirmed by a robust multi-judge panel. Retrieval engineering and prompt/enrichment got it to 7:6; a deliberate QLoRA fine-tuning experiment then closed the last gap on the regulatory-judgment workflows RAG couldn’t teach.

The payoff: a frontier-quality proposal engine running fully on-prem, customer-controlled, at a fixed monthly cost.

RAG gets you to near-parity cheaply; the fine-tuning experiment we ran next closed the judgment gap RAG can’t — taking the local model from losing every hard section to matching Claude on exactly those workflows.

prag-matic, Engineering

Tell us about your project

Our office

  • Bangalore
    Nubewired Software Technologies Pvt. Ltd.
    #213, Rainmakers Workspace, 2nd Floor
    Ramanashree Arcade 18, MG Road
    Bangalore - 560001, Karnataka, India
    CIN: U62013KA2024PTC186730