One Model, Any Task: Why Foundation Models Are Changing Radiology Workflows
May 27, 2026
For years, the conversation around AI in radiology has moved fast. New tools, new benchmarks, new promises. However, despite all of this activity, many radiologists working through a busy day still switch between a handful of narrow applications, each solving one isolated task, none of them talking to each other. The workflow stays fragmented and the cognitive load stays with the doctor. Foundation models represent a fundamentally different starting point. They are not another tool added to the stack, but a shift in how AI for radiology is built from the ground up. Understanding what they are, and why they matter for imaging AI, is the first step toward understanding where the field is heading. This article breaks down what foundation models actually are, why radiology needs its own, and how they differ from large language models and the narrow tools already deployed across departments.
I. What is a foundation model in radiology?
In 2022, Raidium published a post arguing that foundation models would change radiology, and that Raidium was building toward that future. At the time, the idea of a single model capable of generalizing across imaging tasks was still a research ambition. Three years on, the landscape has shifted. Foundation models in radiology are no longer a prediction. In 2026 the reality is not a single foundation model, but a foundation model-based approach: multiple models, orchestrated together, in an era where agentic AI makes it possible to address entire workflows, not just individual tasks. In practice, this means a system that can gather the clinical context for a patient, detect lesions, identify which ones are new compared to prior exams, and generate a structured report adapted to that patient’s history, instead of simply solving a single computer vision task such as detecting lesion in isolation.
To understand foundation models, it helps to start with how most AI tools in use today were built. Traditional machine learning models are designed around a specific task. The labeled data is gathered, train a model on that data, and deploy it to do one thing well. A model trained to detect pulmonary nodules detects pulmonary nodules, and that is all it is able to do. Consequently, these traditional models cannot segment liver lesions, nor does it support report drafting. It solves the problem it was built for, and nothing else. In the field, these are called narrow, or task-specific, models.

Figure 1. Overview of the RadImageNet-VQA dataset
Foundation models work differently. They are trained on very large amounts of diverse, often unlabeled data before being adapted to specific tasks. Rather than learning to solve one problem, they build a rich, general representation of patterns, structures, and relationships, and this pre-training phase is where the real depth is built.
To make this concrete, consider how a child learns. Before anyone teaches them physics, they build an intuitive sense of gravity by watching objects fall. Before formal language lessons, they develop early speech through observation and interaction with their surroundings. Learning happens continuously, from varied experience, without a supervisor labeling every input. Targeted teaching comes later, and it works because it builds on something already there, building on a foundation that is already there.

Figure 2: From pre-training to a foundation model-based approach: one shared backbone, many specializations.
Foundation models follow a similar principle at scale. Pre-trained on enormous datasets, they develop a broad, generalized understanding. That foundation can then be adapted toward specific tasks with far less data and effort than building a narrow model from scratch. Having that in mind, the scale involved matters for foundation models. A classic supervised model like ResNet-50 has around 25 million parameters.
Foundation models operate at orders of magnitude higher, and that scale is directly tied to their ability to generalize across tasks that were never part of their original training. To give a concrete sense of what this means in radiology: Curia, Raidium's first foundation model, was pre-trained on over 200 million CT and MRI slices and evaluated across a 19-task benchmark spanning both modalities. But a single model is only the starting point. The real shift comes from a foundation model-based approach: multiple models, each specialized for a different imaging challenge, orchestrated together to address the full complexity of radiology workflows.
II. Why does radiology need its own foundation models?
Foundation models already exist. Large, capable, general-purpose systems are widely available. So why can't radiology simply borrow from what already exists? The answer comes down to the nature of the data itself.
The field has already recognized this. A growing number of radiology-specific foundation models have emerged in recent years, each making a deliberate choice to move beyond general-purpose systems. Rad1, developed by Harrison.ai, focuses on chest X-ray interpretation. Med-Gemini, from Google DeepMind, is a multimodal system trained on medical datasets including imaging and clinical text. MAIRA-2, from Microsoft Research, pairs imaging with radiology reports for structured reasoning tasks. The Stanford review published in Radiology in 2025 maps this landscape in detail, identifying the key properties that distinguish radiology foundation models from their general-purpose counterparts: large-scale architectures, self-supervised training on medical data, and the ability to generalize across tasks without extensive expert annotation for every new application.
What these efforts share is a recognition that a CT scan is not a photograph, and a chest MRI is not a document. Radiology imaging is three-dimensional, volumetric data. Spatial depth, tissue structure, and change over time are all embedded in a single exam. A model trained on internet images and text does not inherently understand that, and when applied to a clinical context, general-purpose models can produce errors on findings that a trained radiologist would immediately recognize. Put simply, they lack the structural understanding required to interpret what they see with diagnostic precision that radiology demands.
Where approaches differ is in the architectural choices made to address this. Some models extend vision-language frameworks to the medical domain. Others focus on specific modalities or task types. At Raidium, the decision was to build a foundation model-based approach: not a single model stretched across every task, but multiple specialized foundation models, each trained to address a specific set of imaging challenges. We started with a vision-only framework first, Curia, trained specifically on CT and MRI, the modalities that carry the most diagnostic complexity in oncology. Curia addresses the core visual challenges that radiology demands: anatomy classification, abnormality detection across oncology and emergency conditions, and image registration across CT and MRI. That breadth, built on a single pre-trained backbone, is what makes it possible to build further models on solid ground, each adding capability without compromising the diagnostic precision that oncology workflows demand.
Building a foundation model-based approach for radiology requires the right imaging data, the right architecture, and a sustained focus on the specific demands of oncology workflows. It cannot be approximated by applying guardrails to a general system.
III. Foundation models vs Large Language Models: What is the difference?
Large language models (LLMs) are a category of foundation models, however, they are trained primarily on text. Their strength is in reasoning, summarizing, and generating language, and in a radiology context, they can be useful for tasks such as supporting report drafting, checking for inconsistencies in completed reports, or pulling relevant history from patient records to support image interpretation. The limitation becomes clear the moment you move from languages to images. LLMs cannot interpret a 3D scan, track how a finding has changed across prior exams, or understand the spatial relationships within a volumetric dataset. LLMs work from the text it is given, and its output is bounded by that.
A radiology foundation model is vision-first. The models built at Raidium, Curia and Curia-2, are built as vision-only frameworks and were trained across CT and MRI data. What the field calls multimodal in this context refers to imaging modalities, not to text and image combined, and this architecture is what allows the model to close the performance gap with specialized tools without depending on language as a scaffold. Vision-and-language capabilities are a separate, active research tracks at Raidium. This distinction matters practically. Language models support the reporting layer. A radiology foundation model operates deeper, at the level of image understanding itself, opening up the kind of diagnostic precision that a text-only system cannot reach.
IV. Radiological Foundation models vs last decade narrow AI tools: Why the difference matters for radiology workflows
Most AI tools deployed in radiology today are narrow by design. One model detects nodules, another segments organs, another measures lesion diameter, and each was built for its task and validated for its task alone. This approach has produced real value in the field, however, it has also created a fragmentation problem. Outputs need to be manually reconciled, context does not carry across tools, and every new capability added to the AI radiology workflow introduces another result to interpret in isolation. More tools means more complexity, and that complexity lands on the radiologist. A foundation model approaches this differently. Because it is pre-trained on broad imaging data, it can be adapted to many tasks using the same underlying model, without rebuilding from scratch each time. It creates a shared technical framework across the workflow, which means outputs are more consistent, development is more scalable, and the different parts of a radiologist's day can be unified rather than assembled from disconnected pieces. This is what building on a foundation actually looks like in practice: the Curia backbone underpins Oncopilot, addressing a different task without requiring a separate model to be trained from the ground up.
The difference is between having a separate tool for every isolated task versus a single technical framework that underpins many capabilities at once. Technical complexity is managed at the model level. The radiologist is freed to focus on interpretation, on the patient, rather than on navigating the stack. This is the shift that changes what radiological workflows can look like: not more tools layered onto a fragmented system, but a foundation from which a truly unified experience can be built.
V. Where radiology is heading to
Foundation models are already reshaping what is possible in AI for radiology, and the gap between research and practice is closing faster than many expected.
The real question is not whether to build on this technology, but how to build it well. That means training on the right imaging data, designing for the specific demands of radiological workflows, and maintaining the transparency and reliability that diagnostic work requires, because a system that cannot be trusted is not a system that is to be used.
Raidium, is building for that future. Curia is a 3D-first vision framework trained across CT and MRI, and it is what powers the tools Raidium is building for a unified, AI-native radiology viewer where technical complexity is managed so that doctors can refocus on what matters most: the patient. Not AI as an add-on to a legacy workflow, but a new starting point for the next frontier of radiology.
Ready to explore more about our foundation models?
Here’s where to start:
Read our other Curia blogposts here:
Read our Curia-2 pre-print here: https://arxiv.org/abs/2604.01987
Read our RadImageNet-VQA preprint here: https://arxiv.org/abs/2512.17396