Limitations
What this project does not claim. Read this before citing or deploying anything here.
Not for clinical use. NaviMed-UMB is a research and engineering platform. It is not validated for diagnosis, treatment, triage, or any patient-care decision. Nothing here is a medical device.
No clinical-accuracy measurement
Layer 1 measures the inference vehicle. It does not measure what the model says. There is no scoring of factual accuracy, reasoning, or clinical correctness on this site. Those belong to the Layer 3 arena and to separate validation studies.
Benchmark is not model quality
Throughput, latency, thermal, and power are properties of the hardware and the serving stack. A fast model is not a good model. The benchmark cannot tell a careful answer from a confident wrong one.
The calibration corpus is not patient data
The Polish SmPC corpus is public drug-information text published by the European Medicines Agency. It contains no patient health information. It is a quantization calibration set, not a clinical dataset.
Local inference depends on deployment hygiene
"No patient data leaves the building" holds for on-premise inference with no external telemetry or API calls. It is a property of how you deploy, not a guarantee the software grants on its own. Model downloads, framework telemetry, logging, crash reports, and system updates are the operator's responsibility to control.
Performance numbers are embargoed
Per-N throughput, latency distributions, energy per token, and cross-model comparisons are held back until peer-reviewed publication. The public figures are the engineering envelope only. See Results.
Single-workstation external validity
Every figure comes from one machine: two AMD Radeon AI PRO R9700 cards under one pinned stack. Results may not transfer to other GPUs, driver versions, or vLLM builds. This is a depth-over-breadth study, not a survey.
Quantization is backend-specific
The AWQ behaviour reported here is specific to gfx1201 (RDNA 4), ROCm 7.2.0, and vLLM 0.19.0. Kernel maturity on this stack is still moving. The same checkpoints may behave differently on NVIDIA or on newer ROCm.
"First public" is a best-effort claim
The model cards say "to the author's knowledge, first public" for each AWQ variant. This is based on a public HuggingFace search at release date. It is not a registry-level claim and is not provable in the strict sense.