Bring your data. Leave with a model.
A medical SLM, fine-tuned on your proprietary data, packaged to run on the CPUs you already own. No GPU procurement. No vendor lock-in. No ML team required.
RAG retrieves. Fine-tune internalises.
Retrieval lets a generic model look up your data at inference time. Fine-tuning teaches your model the language, structure, and judgement of your domain — so every output is faster, cheaper, and grounded in how your team actually works.
- Lower latency — no retrieval round-trip for routine reasoning.
- Lower cost — smaller models replace expensive frontier API calls.
- Stronger privacy — model weights stay in your environment.
- Higher consistency — your tone, your formats, your terminology.
The lifecycle
- 01 · Minutes
Upload
Drop in PDFs, EMR exports, structured tables, internal SOPs. We fingerprint and version everything.
- 02 · Minutes
Validate
Automatic PHI detection and redaction. Schema validation. Quality scoring with reject reasons.
- 03 · ≈ 4–6 hours
Fine-tune
LoRA / QLoRA recipes pre-configured for medical SLM bases. Eval harness scores hallucination, faithfulness, and clinical accuracy.
- 04 · 30 minutes
Evaluate
Side-by-side outputs vs base model. Manual SME review queue. Automated regression on your golden test set.
- 05 · Minutes
Package
Quantised GGUF, ONNX, or vLLM-ready bundles. Includes model card, eval report, and deployment manifest.
- 06 · Same day
Deploy
Pull a Docker image into your VPC, on-prem, or hospital intranet. CPU is enough.
Pick your size. They all run on CPU.
Three sizes of the Evarx Medical SLM serve as the base for your fine-tune. Quantised builds (Q4_K_M, Q5_K_M) ship for every size — even the 7B fits on a workstation.
| Model | Size | Recommended CPU | Latency (Q4_K_M) |
|---|---|---|---|
| Evarx-Med-1B | 1.1B params | 8 vCPU · 16 GB RAM | ~120ms / token |
| Evarx-Med-3B | 3.0B params | 16 vCPU · 32 GB RAM | ~180ms / token |
| Evarx-Med-7B | 7.2B params | 32 vCPU · 64 GB RAM | ~280ms / token |
Live demo
Watch a custom fine-tune run end-to-end.
Upload
4sValidate
6sFine-tune
8sEvaluate
7sDeploy
5s
Press start to play the fine-tune…
Stage
Upload
Tokens / sec
—
Engine
evarx-trainer · GPU
What does private actually cost?
Live monthly cost across the three options. Tweak the sliders to see when a fine-tune pays for itself.
5Cr tokens / month
20 seats
Custom SLM hosting
Runs on your CPU servers
Custom vs Frontier
₹75,750saved / month
≈ 98% lower than running everything on a frontier hosted API.
Monthly cost · INR
Indicative. Actuals depend on volume and contract.
Frontier hosted API
Per-token pricing dominates at this volume.
₹77,500
₹1,250 / 1M tokens+ ₹15,000 fixedEvarx Private (Medical SLM)
GPU-backed inference in your VPC.
₹51,500
₹180 / 1M tokens+ ₹42,500 fixedEvarx Custom (Fine-tuned · CPU)
Runs on hardware you already own.
₹1,750
₹35 / 1M tokensno fixed cost
Continuous improvement loop
Every workflow run that gets a thumbs-up — or an SME-edited correction — becomes a labelled training pair. Schedule nightly refresh runs and your model improves while you sleep. Roll back any version with one click.
- Versioned weights with git-style diffing
- Eval gates prevent regressions from shipping
- Per-team feedback isolation
Why CPU-runnable matters
Hospitals, regulated pharma units, and air-gapped research sites can't always procure GPUs. Quantised SLMs let you ship the same model your data scientists trained into a 16-core production server — no infrastructure rewrite.
- Runs on existing hospital-grade hardware
- Air-gap deployable via signed Docker images
- Cost per token approaches zero at steady state
A fine-tune scoped, signed off, and running this week.
Tell us your highest-leverage workflow. We'll respond with a data spec, a fixed-price scope, and a deployment plan within a business day.
- NDA & DPA on request
- Fixed-price first fine-tune
- Includes one production agent
