Skip to content
Our USP · Custom SLM

Bring your data. Leave with a model.

A medical SLM, fine-tuned on your proprietary data, packaged to run on the CPUs you already own. No GPU procurement. No vendor lock-in. No ML team required.

Why fine-tune?

RAG retrieves. Fine-tune internalises.

Retrieval lets a generic model look up your data at inference time. Fine-tuning teaches your model the language, structure, and judgement of your domain — so every output is faster, cheaper, and grounded in how your team actually works.

  • Lower latency — no retrieval round-trip for routine reasoning.
  • Lower cost — smaller models replace expensive frontier API calls.
  • Stronger privacy — model weights stay in your environment.
  • Higher consistency — your tone, your formats, your terminology.

The lifecycle

  1. 01 · Minutes

    Upload

    Drop in PDFs, EMR exports, structured tables, internal SOPs. We fingerprint and version everything.

  2. 02 · Minutes

    Validate

    Automatic PHI detection and redaction. Schema validation. Quality scoring with reject reasons.

  3. 03 · ≈ 4–6 hours

    Fine-tune

    LoRA / QLoRA recipes pre-configured for medical SLM bases. Eval harness scores hallucination, faithfulness, and clinical accuracy.

  4. 04 · 30 minutes

    Evaluate

    Side-by-side outputs vs base model. Manual SME review queue. Automated regression on your golden test set.

  5. 05 · Minutes

    Package

    Quantised GGUF, ONNX, or vLLM-ready bundles. Includes model card, eval report, and deployment manifest.

  6. 06 · Same day

    Deploy

    Pull a Docker image into your VPC, on-prem, or hospital intranet. CPU is enough.

Base models

Pick your size. They all run on CPU.

Three sizes of the Evarx Medical SLM serve as the base for your fine-tune. Quantised builds (Q4_K_M, Q5_K_M) ship for every size — even the 7B fits on a workstation.

ModelSizeRecommended CPULatency (Q4_K_M)
Evarx-Med-1B1.1B params8 vCPU · 16 GB RAM~120ms / token
Evarx-Med-3B3.0B params16 vCPU · 32 GB RAM~180ms / token
Evarx-Med-7B7.2B params32 vCPU · 64 GB RAM~280ms / token

Live demo

Watch a custom fine-tune run end-to-end.

Overall0%
  1. Upload

    4s
  2. Validate

    6s
  3. Fine-tune

    8s
  4. Evaluate

    7s
  5. Deploy

    5s
~ evarx · finetune.logready

Press start to play the fine-tune…

Stage

Upload

Tokens / sec

Engine

evarx-trainer · GPU

TCO calculator

What does private actually cost?

Live monthly cost across the three options. Tweak the sliders to see when a fine-tune pays for itself.

50M

5Cr tokens / month

20

20 seats

Custom SLM hosting

Runs on your CPU servers

Custom vs Frontier

₹75,750saved / month

98% lower than running everything on a frontier hosted API.

Monthly cost · INR

Indicative. Actuals depend on volume and contract.

  • Frontier hosted API

    Per-token pricing dominates at this volume.

    ₹77,500

    ₹1,250 / 1M tokens+ ₹15,000 fixed
  • Evarx Private (Medical SLM)

    GPU-backed inference in your VPC.

    ₹51,500

    ₹180 / 1M tokens+ ₹42,500 fixed
  • Evarx Custom (Fine-tuned · CPU)

    Runs on hardware you already own.

    ₹1,750

    ₹35 / 1M tokensno fixed cost
Custom-tier on-prem deployments hit cost-parity within ~3 months at this volume.

Continuous improvement loop

Every workflow run that gets a thumbs-up — or an SME-edited correction — becomes a labelled training pair. Schedule nightly refresh runs and your model improves while you sleep. Roll back any version with one click.

  • Versioned weights with git-style diffing
  • Eval gates prevent regressions from shipping
  • Per-team feedback isolation

Why CPU-runnable matters

Hospitals, regulated pharma units, and air-gapped research sites can't always procure GPUs. Quantised SLMs let you ship the same model your data scientists trained into a 16-core production server — no infrastructure rewrite.

  • Runs on existing hospital-grade hardware
  • Air-gap deployable via signed Docker images
  • Cost per token approaches zero at steady state
Get started

A fine-tune scoped, signed off, and running this week.

Tell us your highest-leverage workflow. We'll respond with a data spec, a fixed-price scope, and a deployment plan within a business day.

  • NDA & DPA on request
  • Fixed-price first fine-tune
  • Includes one production agent
Book a demo