AI as the Invisible Medical Physicist: Automating QA in CT, MRI, and Mammography

Diagnomatic Medical Physics and Engineering Team

Medical Physics and Engineering for Quality Assurance

March 24, 2026

Quality assurance (QA) in radiology has a paradoxical side: the very process designed to protect image quality can easily become one of the most time‑consuming, subjective tasks in the department. As scanners multiply, protocols proliferate, and regulatory demands tighten, medical physicists and technologists spend a significant share of their day running phantom scans, scoring images, filling in forms, and chasing trends across scattered spreadsheets.

‍

What a Digital Physicist Actually Does

On paper, a large proportion of QA work can be described in data‑centric terms: detect patterns, measure them, and compare them to tolerances. The classic phantom‑based QA checklist includes:

Identifying edges, bars, and line patterns to estimate spatial resolution and modulation transfer function.

Measuring uniformity, noise, and contrast‑to‑noise ratio in well‑defined regions.

Checking geometric accuracy, slice thickness, CT number linearity, and artifact levels.

Clinical QA adds its own layer: are patients centered, are scan ranges correct, are images free of motion artifacts, and does the study meet diagnostic acceptability criteria such as PGMI in mammography. For decades, these steps have relied on human eyes and manual measurements, which introduces inter‑observer variability and makes long‑term trend analysis cumbersome.

From a machine‑learning perspective, these tasks boil down to image classification, object detection, and regression, exactly the kind of problems deep neural networks are built to solve. This is why fully or semi‑automated QA is now a realistic target, not science fiction.

‍

Automating Phantom‑Based QA with AI

Research groups and early commercial platforms have shown that convolutional neural networks (CNNs) can score phantom images in CT, MRI, and mammography with expert‑level agreement, often in a fraction of a second. In mammography, for instance, AI models have been trained to recognize fibers, microcalcifications, and mass objects in dedicated phantoms, then output standard scores used in accreditation programs.

More general frameworks take this a step further. One line of work describes a “universal” phantom analysis pipeline that:

Automatically recognizes which phantom is being imaged.

Registers an idealized digital template to the acquired image.

Extracts multiple metrics (contrast, noise, MTF, uniformity, HU accuracy) in a standardized way.

Outputs pass/fail decisions and numerical trends ready for dashboards.

In a cloud‑connected setup, the workflow becomes simple: a technologist acquires a scheduled phantom scan, the scanner sends images via DICOM to a gateway, and the AI engine returns a structured QA report within minutes. Instead of manually plotting numbers, physicists are presented with longitudinal curves and automatic alerts when a parameter approaches tolerance limits.

The net effect is a shift from episodic, paperwork‑heavy checks to continuous, machine‑readable QA data.

‍

CT: AI Watching Image Quality and Patient Safety

CT has been a natural early target for AI‑driven QA because of its central role, radiation dose considerations, and the rapid adoption of advanced reconstruction methods. Recent work highlights several contributions AI can make:

Monitoring patient positioning and scan range to reduce unnecessary exposure and avoid cut‑offs.

Assessing image noise and low‑contrast visibility in routine clinical scans, not just phantoms, to ensure protocols remain within predefined quality corridors.

Supporting dose optimization: deep‑learning reconstruction and denoising can maintain or improve perceived image quality at substantially lower dose levels, which automated QA tools can verify over time.

In practice, this might look like a chest CT protocol updated with deep‑learning reconstruction that reduces radiation by 20-30%, with an AI QA layer continuously confirming that noise and low‑contrast detectability remain within safe bounds. That combination, protocol change plus automated surveillance, makes dose‑reduction initiatives more robust and auditable.

‍

Mammography: A Natural Testbed for Automated QA

Mammography sits at the intersection of high volume, stringent regulatory oversight, and a low tolerance for missed lesions, which makes QA especially demanding. Daily or weekly phantom runs are standard, and many programs still rely on visual scoring of fibers, masses, and microcalcifications using semi‑quantitative scales.

AI has quickly found traction here. Studies on digital mammography and digital breast tomosynthesis phantoms show that CNN‑based systems can:

Automatically detect each phantom object class and assign standardized scores.

Reduce variability between readers.

Shorten evaluation time from minutes to seconds per image.

Parallel efforts have focused on clinical images: AI models that classify mammograms into PGMI‑style quality categories, flag positioning errors, or detect artifacts that might necessitate repeat imaging. For technologists, this means immediate feedback at the console; for departments, it means lower repeat rates and more consistent exam quality.

For a vendor that designs mammography phantoms and QA software, combining optimized phantom design with AI‑based scoring engines turns a traditional compliance exercise into a data‑rich, scalable quality program.

‍

MRI: A Powerful Modality with Stubborn QA Challenges

MRI presents a particularly interesting frontier for AI‑based QA. On the one hand, deep learning plays a key role in accelerating acquisition, suppressing noise, and reducing contrast‑agent dose, making scans faster and safer. On the other hand, quality assurance in MRI is intrinsically harder than in CT or mammography, and current AI tools face several open challenges.

1. Sequence diversity and protocol variability

In a way, MRI is not one modality but a family of sequences and contrasts: T1‑weighted, T2‑weighted, FLAIR, diffusion, perfusion, quantitative mapping, and more, with widely varying vendor implementations and local protocol tweaks. AI models trained on a limited set of sequences and centers can struggle to generalize when confronted with different coils, field strengths, or acquisition parameters – the classic “domain shift” problem.

Results from multi‑site MR image‑quality studies show that deep‑learning and conventional machine‑learning methods can perform only modestly when evaluated in a strict leave‑one‑site‑out fashion, emphasizing the need for more diverse training data and robust architectures.

2. Artifacts are subtle, diverse, and context‑dependent

MR images suffer from a rich taxonomy of artifacts: motion, ghosting, susceptibility, fat‑saturation failure, Gibbs ringing and field inhomogeneity, among others. Deep‑learning models have demonstrated good performance in detecting artifacts in specific contexts, such as high b‑value diffusion‑weighted breast MRI; however, not all artifact types are recognized equally well, and false positives remain an issue.

In some cases, networks identify “minor” artifacts that human readers deem acceptable, raising questions about how AI‑based QA should align with clinically meaningful thresholds rather than purely mathematical ones.

3. Limited and biased training labels

High‑quality labels for MRI image quality and QA are expensive to obtain, often relying on multiple expert radiologists or physicists, and can remain subjective even with structured scoring schemes. Reviews of AI in MRI emphasize that segmentation and registration models, which often form part of QA or quantitative pipelines, suffer from limited training data in many clinically important settings (e.g., post‑operative imaging, less common tumor types).

Bias in training data can also propagate into QA systems: one study in prostate MRI found that a deep‑learning image‑quality score was inadvertently influenced by the presence of clinically significant cancer, illustrating how content and quality can become entangled in complex ways.

A label is the correct answer attached to a training image for the AI. In MRI, this could be an expert marking a scan as “diagnostically acceptable” or “degraded by motion.”

4. Explainability, regulation, and workflow integration

For AI‑based QA to be trusted in MRI, radiologists and physicists need to understand why a scan was labeled “poor quality” and what should happen next. Black‑box scores without visual explanations or metric breakdowns are difficult to reconcile with existing QA guidelines and accreditation requirements.

Regulatory and implementation perspectives also matter, prospective deployments of AI in MRI remain relatively rare compared with retrospective studies, and rigorous external validation across multiple sites and vendors is still the exception rather than the rule. Reviews consistently list limited generalizability, lack of external validation, and insufficient explainability as core barriers to broad adoption of AI in CT/MRI QA.

Taken together, these challenges do not undermine the promise of AI‑based QA for MRI, but they do underline that careful dataset curation, standardized protocols, and human oversight will be essential.

‍

From Episodic Checks to Continuous, Predictive QA

Once phantom and clinical images are scored automatically and stored centrally, QA changes character. Instead of isolated test results, departments gain access to longitudinal time series of noise, spatial resolution, uniformity, artifact scores, and protocol parameters for every scanner and modality.

On top of these time series, simple statistical models or more advanced machine‑learning approaches can detect subtle drifts, a slow loss of SNR, a creeping increase in geometric distortion, or worsening low‑contrast detectability, and forecast when tolerance limits are likely to be breached. This enables predictive maintenance: service visits can be scheduled before failures disrupt workflows, and physicists can intervene before image quality degrades enough to impact diagnosis.

In such a scenario, human experts move from manual measurement and box‑ticking toward higher‑level tasks: interpreting trends, refining protocols, and deciding how best to balance throughput, dose or contrast usage, and diagnostic performance.

Keeping the Human in the Loop

Despite the impressive progress of AI in QA for CT, MRI, and mammography, the goal is not, and should not be, to replace medical physicists. Instead, the most productive model is one where AI handles the repetitive, high‑volume tasks - scoring phantoms, screening routine studies for obvious quality issues, tracking thousands of metrics over time - and humans concentrate on designing robust QA programs, investigating anomalies, and making clinically informed decisions.

To reach that equilibrium, several principles are key:

Transparency: provide visual overlays, metric breakdowns, and performance summaries rather than opaque scores.

Validation: test models on multi‑center, multi‑vendor data with careful external validation and clear limitations.

Alignment with standards: map AI outputs to existing QA guidelines and accreditation criteria so that automated checks can support, not conflict with, regulatory requirements.

If those conditions are met, AI can indeed function as an invisible medical physicist - one that never gets tired of counting fibers, measuring noise, or scrolling through quality plots -allowing human experts to focus on what they do best: designing smarter imaging, protecting patients, and pushing the field forward.

‍

Sources:

https://pmc.ncbi.nlm.nih.gov/articles/PMC10039170/

https://healthmanagement.org/c/sepsis/News/ai-driven-enhancements-in-ct-and-mri-imaging-for-patient-safety

https://pmc.ncbi.nlm.nih.gov/articles/PMC11847764/

https://www.iaea.org/resources/hhc/medical-physics/diagnostic-radiology/remote-automated-quality-control-in-radiology

https://www.nature.com/articles/s44303-025-00076-0

No items found.

Share this post

AI as the Invisible Medical Physicist: Automating QA in CT, MRI, and Mammography

What a Digital Physicist Actually Does

Automating Phantom‑Based QA with AI

CT: AI Watching Image Quality and Patient Safety

Mammography: A Natural Testbed for Automated QA

MRI: A Powerful Modality with Stubborn QA Challenges

1. Sequence diversity and protocol variability

2. Artifacts are subtle, diverse, and context‑dependent

3. Limited and biased training labels

4. Explainability, regulation, and workflow integration

From Episodic Checks to Continuous, Predictive QA

Keeping the Human in the Loop

Subscribe to our newsletter

Also see

The Avia Capital Partners fund is investing in Pro-Project

Arab Health Exhibition 2023

We've been honoured by Leader of Innovation 2019 prize

AI as the Invisible Medical Physicist: Automating QA in CT, MRI, and Mammography

Subscribe to our newsletter

Also see

The Avia Capital Partners fund is investing in Pro-Project

Arab Health Exhibition 2023

We've been honoured by Leader of Innovation 2019 prize

Informacja dla klientów