Clinical LLMs on hardware you own
NaviMed-UMB runs open-weight language models on a consumer dual AMD Radeon AI PRO R9700 workstation. It ships a benchmark suite, public model cards, and a documented method. Inference runs on-premise. No patient data leaves the building.
That privacy claim holds for on-premise inference with no external telemetry or API calls. Deployment hygiene is the operator's job. See Limitations.
Clinical safety boundary. NaviMed-UMB is a research and engineering platform for local inference and benchmarking. It is not validated for diagnosis, treatment, triage, or any patient-care decision. Model quality and clinical accuracy are out of scope here. They are evaluated separately in Layer 3. See Limitations.
The platform is Layer 1 of a three-layer design for on-premise medical AI. It is the layer that runs today.
If you read one thing.
- Layer 1 is implemented and released (v0.4.0).
- Layers 2 and 3 are scoped.
- Public: ten model cards, one calibration dataset, the method, and the reproducibility stack.
- Performance numbers are embargoed until publication.
- No clinical-accuracy claim is made here.
The question
What does it take to run a privacy-sensitive clinical model on hardware a department can buy? Two consumer GPUs. On premise. No patient data leaves the building. This site answers the hardware half of that question. It releases the models it produced.
How it fits together
Hardware
AMD Ryzen 9 9950X3D. 2× GIGABYTE Radeon AI PRO R9700 32 GB (gfx1201, RDNA 4). 96 GB DDR5-6000. Kubuntu 24.04, kernel 6.17, ROCm 7.2.0, vLLM 0.19.0.
Releases (v0.4.0)
- Ten HuggingFace model cards. The Llama-PLLuM-70B family in eight AWQ variants for two-card deployment, plus two variants that fit on a single R9700. To the author's knowledge, the first public AWQ W4A16 quantization of each. See Models.
- One calibration dataset. A reusable Polish clinical SmPC corpus. 418 fragments of EMA-published text. No PHI.
- One Zenodo deposit. Concept DOI 10.5281/zenodo.19851346, current version v0.4.0.
Three layers
Each layer is a separate contribution. This site documents Layer 1.
L1 · Workstation
The hardware envelope, the benchmark platform, and the AWQ releases. Released as v0.4.0.
L2 · Retrieval
A clinical retrieval layer over the Polish SmPC corpus. Hybrid retrieval, not a single vector store. In scoping.
L3 · Arena
A model-quality evaluation method, kept separate from the throughput work. In scoping.
Scope
The benchmark measures throughput, latency, thermal envelope, and power under concurrent load. It does not measure model quality, reasoning, or clinical accuracy. A confidently wrong model runs as fast as a correct one. The benchmark cannot tell them apart. Quality belongs to the L3 Arena and to clinical validation. Not here.
Public now vs held back
| Public now | Held until the papers |
|---|---|
| Model cards | Throughput scaling |
| Calibration corpus | Latency distributions |
| Hardware envelope | Energy per token |
| Reproducibility stack | Cross-model comparisons |
| Methodology | Paper-bound figures |