NaviMed-UMB

Local clinical-AI inference on consumer AMD GPUs. Polish LLM releases and an open benchmark method.

Clinical LLMs on hardware you own

NaviMed-UMB runs open-weight language models on a consumer dual AMD Radeon AI PRO R9700 workstation. It ships a benchmark suite, public model cards, and a documented method. Inference runs on-premise. No patient data leaves the building.

That privacy claim holds for on-premise inference with no external telemetry or API calls. Deployment hygiene is the operator's job. See Limitations.

Clinical safety boundary. NaviMed-UMB is a research and engineering platform for local inference and benchmarking. It is not validated for diagnosis, treatment, triage, or any patient-care decision. Model quality and clinical accuracy are out of scope here. They are evaluated separately in Layer 3. See Limitations.

The platform is Layer 1 of a three-layer design for on-premise medical AI. It is the layer that runs today.

If you read one thing.

  • Layer 1 is implemented and released (v0.4.0).
  • Layers 2 and 3 are scoped.
  • Public: ten model cards, one calibration dataset, the method, and the reproducibility stack.
  • Performance numbers are embargoed until publication.
  • No clinical-accuracy claim is made here.

The question

What does it take to run a privacy-sensitive clinical model on hardware a department can buy? Two consumer GPUs. On premise. No patient data leaves the building. This site answers the hardware half of that question. It releases the models it produced.

How it fits together

Clinical SmPC corpus (EMA, no PHI)
AWQ W4A16 calibration
Quantized model cards (HuggingFace)
Local serving: vLLM, ROCm, R9700
Benchmark harness
Engineering envelope (public)
Layer 2 retrieval and Layer 3 arena (scoped)

Hardware

AMD Ryzen 9 9950X3D. 2× GIGABYTE Radeon AI PRO R9700 32 GB (gfx1201, RDNA 4). 96 GB DDR5-6000. Kubuntu 24.04, kernel 6.17, ROCm 7.2.0, vLLM 0.19.0.

Releases (v0.4.0)

Three layers

Each layer is a separate contribution. This site documents Layer 1.

L1 · Workstation

The hardware envelope, the benchmark platform, and the AWQ releases. Released as v0.4.0.

L2 · Retrieval

A clinical retrieval layer over the Polish SmPC corpus. Hybrid retrieval, not a single vector store. In scoping.

L3 · Arena

A model-quality evaluation method, kept separate from the throughput work. In scoping.

Scope

The benchmark measures throughput, latency, thermal envelope, and power under concurrent load. It does not measure model quality, reasoning, or clinical accuracy. A confidently wrong model runs as fast as a correct one. The benchmark cannot tell them apart. Quality belongs to the L3 Arena and to clinical validation. Not here.

Public now vs held back

Public nowHeld until the papers
Model cardsThroughput scaling
Calibration corpusLatency distributions
Hardware envelopeEnergy per token
Reproducibility stackCross-model comparisons
MethodologyPaper-bound figures

What this site holds back. Performance numbers stay private until the papers are published. Throughput, latency distributions, energy per token, and cross-model comparisons are not here. The engineering side is: hardware limits, the method, the model cards, and how to reproduce them.