Announcing the Release of the Curia Benchmark and Evaluation Code
October 22, 2025 – Following the release of Curia, our multi-modal foundation model for radiology, we are excited to announce the public release of our exclusive 19-task benchmark, named CuriaBench. CuriaBench offers the community a robust tool to compare Curia against various state-of-the-art models. It covers a broad spectrum of clinical cases, representative of the variety of radiologists’ use cases and allows for the exploration of generalization in few-shot and cross-modal settings.

CuriaBench: A Benchmark Reflecting the Spectrum of Radiologists’ Clinical Cases
CuriaBench comprises 19 distinct radiological tasks that span both CT and MRI modalities covering most anatomical regions. A demo of each of those tasks can be found in the introduction to Curia.
The benchmark's tasks fall into several categories:
Anatomical Tasks: These evaluate the model's ability to identify organs across various body regions and demonstrate its cross-modality generalization capabilities. Taks include MRI and CT Organ Recognition, Neuroimaging Age Estimation, and Image Registration.
Oncology: Focusing on cancer-related challenges, this category includes Lung Nodule and Kidney Lesion Malignancy classification, Tumor Anatomical Site identification, and Kidney Cancer Survival prediction.
Musculoskeletal: Tasks here involve assessing degenerative diseases of the lumbar spine, specifically Foraminal Narrowing, Subarticular Stenosis, and Spinal Canal Stenosis, as well as detecting ACL tears from knee MRI.
Emergency: This high-stake section evaluates the detection of acute conditions like Intracranial Hemorrhages, Myocardial Infarctions, Abdominal Trauma, and Signs of a Stroke.
Infectious: This part of the benchmark focuses on detecting Pulmonary Infections from chest CT scans, including COVID-19 and non-COVID pneumonia.
Neurodegenerative: This task involves predicting Alzheimer's disease from brain MRI images.
We evaluated Curia's performance against two leading models in the field, including BiomedCLIP and MedImageInsight. Our analysis shows that Curia consistently meets or surpasses the performance of these models and even delivers performance comparable to, or exceeding, that of senior resident radiologists on most benchmark tasks.

Releasing the Evaluation Codebase and Trained Heads
To ensure full reproducibility and accelerate community research, we're releasing the evaluation codebase for CuriaBench. This release is crucial as it enables you to:
Reproduce Curia's results from our paper and the benchmark.
Test your own data on Curia.
Use Curia for a specific downstream task without fine-tuning your own head.
We are releasing multiple types of heads, adapted for different problems: linear heads for simple classification tasks, or attention-based heads designed to find more subtle signs of pathologies in images. We are always eager to learn more about how Curia serves the community, so we invite you to share your results with us! Check out the code here: https://github.com/raidium-med/curia
Applying Curia to new tasks: FOMO and UNICORN Challenges
To further demonstrate Curia’s versatility and generalization capabilities, we recently participated in two challenges hosted around MICCAI 2025: the FOMO and UNICORN challenges.
FOMO 2025 (Foundation Model Challenge for Brain MRI): This challenge focused on few-shot generalization and the impact of self-supervised pre-training on downstream performance in the brain MRI domain. The tasks included infarct detection, meningioma segmentation, and brain age estimation. We completed in the open track, allowing us to leverage our foundation model on any combination of data.
UNICORN 2025 (Unified beNchmark for Imaging in Computational pathology, Radiology and Natural language): This challenge aimed to establish a comprehensive public benchmark for multimodal foundation models in medical imaging. It used a "one-to-many" approach, assessing how a single model could adapt to a variety of tasks across radiology and pathology.
A key takeaway from our participation was demonstrating how rapidly and effectively Curia can be deployed to tackle completely new challenges. We believe the model’s versatility is its greatest strength, and we're highly motivated to see its capabilities further explored. If you are using Curia for novel applications or tasks, we strongly encourage you to connect with our team to share your use cases!

Access the Benchmark and More
Download the benchmark code from HuggingFace
Download the evaluation codebase from GitHub
Read our preprint on arXiv
Stay tuned for more updates:
Thank you to all contributors of the datasets we used to develop our benchmark: Wasserthal (TotalSegmentator CT, TotalSegmentator CT &MRI), D'Antonoli (TotalSegmentator MRI), the Brain Development Project (IXI Dataset), Settio (LUNA16), Heller (KiTS23), Yan (DeepLesion), Štajduhar (KneeMRI dataset), MICCAI 2020 challenge (EMIDEC), Liew (ATLAS v2.0), Marcus (OASIS-1), Gunraj (COVIDx CT), Hering (Learn2Reg Challenge), Segars (XCAT Phantom), Ji (AMOS22), Clark (TCGA-KIRC / TCIA), RSNA Challenge, 2024 (RSNA 2024 Lumbar Spine Degenerative Classification), RSNA Challenge, 2023 (RSNA 2023 Abdominal Trauma Detection), RSNA Challenge, 2019 (RSNA ICH Detection)







