
Quality assurance (QA) in radiology has a paradoxical side: the very process designed to protect image quality can easily become one of the most time‑consuming, subjective tasks in the department. As scanners multiply, protocols proliferate, and regulatory demands tighten, medical physicists and technologists spend a significant share of their day running phantom scans, scoring images, filling in forms, and chasing trends across scattered spreadsheets.
On paper, a large proportion of QA work can be described in data‑centric terms: detect patterns, measure them, and compare them to tolerances. The classic phantom‑based QA checklist includes:
Clinical QA adds its own layer: are patients centered, are scan ranges correct, are images free of motion artifacts, and does the study meet diagnostic acceptability criteria such as PGMI in mammography. For decades, these steps have relied on human eyes and manual measurements, which introduces inter‑observer variability and makes long‑term trend analysis cumbersome.
From a machine‑learning perspective, these tasks boil down to image classification, object detection, and regression, exactly the kind of problems deep neural networks are built to solve. This is why fully or semi‑automated QA is now a realistic target, not science fiction.
Research groups and early commercial platforms have shown that convolutional neural networks (CNNs) can score phantom images in CT, MRI, and mammography with expert‑level agreement, often in a fraction of a second. In mammography, for instance, AI models have been trained to recognize fibers, microcalcifications, and mass objects in dedicated phantoms, then output standard scores used in accreditation programs.
More general frameworks take this a step further. One line of work describes a “universal” phantom analysis pipeline that:
Outputs pass/fail decisions and numerical trends ready for dashboards.
In a cloud‑connected setup, the workflow becomes simple: a technologist acquires a scheduled phantom scan, the scanner sends images via DICOM to a gateway, and the AI engine returns a structured QA report within minutes. Instead of manually plotting numbers, physicists are presented with longitudinal curves and automatic alerts when a parameter approaches tolerance limits.
The net effect is a shift from episodic, paperwork‑heavy checks to continuous, machine‑readable QA data.
CT has been a natural early target for AI‑driven QA because of its central role, radiation dose considerations, and the rapid adoption of advanced reconstruction methods. Recent work highlights several contributions AI can make:
In practice, this might look like a chest CT protocol updated with deep‑learning reconstruction that reduces radiation by 20-30%, with an AI QA layer continuously confirming that noise and low‑contrast detectability remain within safe bounds. That combination, protocol change plus automated surveillance, makes dose‑reduction initiatives more robust and auditable.
Mammography sits at the intersection of high volume, stringent regulatory oversight, and a low tolerance for missed lesions, which makes QA especially demanding. Daily or weekly phantom runs are standard, and many programs still rely on visual scoring of fibers, masses, and microcalcifications using semi‑quantitative scales.
AI has quickly found traction here. Studies on digital mammography and digital breast tomosynthesis phantoms show that CNN‑based systems can:
Parallel efforts have focused on clinical images: AI models that classify mammograms into PGMI‑style quality categories, flag positioning errors, or detect artifacts that might necessitate repeat imaging. For technologists, this means immediate feedback at the console; for departments, it means lower repeat rates and more consistent exam quality.
For a vendor that designs mammography phantoms and QA software, combining optimized phantom design with AI‑based scoring engines turns a traditional compliance exercise into a data‑rich, scalable quality program.
MRI presents a particularly interesting frontier for AI‑based QA. On the one hand, deep learning plays a key role in accelerating acquisition, suppressing noise, and reducing contrast‑agent dose, making scans faster and safer. On the other hand, quality assurance in MRI is intrinsically harder than in CT or mammography, and current AI tools face several open challenges.
In a way, MRI is not one modality but a family of sequences and contrasts: T1‑weighted, T2‑weighted, FLAIR, diffusion, perfusion, quantitative mapping, and more, with widely varying vendor implementations and local protocol tweaks. AI models trained on a limited set of sequences and centers can struggle to generalize when confronted with different coils, field strengths, or acquisition parameters – the classic “domain shift” problem.
Results from multi‑site MR image‑quality studies show that deep‑learning and conventional machine‑learning methods can perform only modestly when evaluated in a strict leave‑one‑site‑out fashion, emphasizing the need for more diverse training data and robust architectures.
MR images suffer from a rich taxonomy of artifacts: motion, ghosting, susceptibility, fat‑saturation failure, Gibbs ringing and field inhomogeneity, among others. Deep‑learning models have demonstrated good performance in detecting artifacts in specific contexts, such as high b‑value diffusion‑weighted breast MRI; however, not all artifact types are recognized equally well, and false positives remain an issue.
In some cases, networks identify “minor” artifacts that human readers deem acceptable, raising questions about how AI‑based QA should align with clinically meaningful thresholds rather than purely mathematical ones.
High‑quality labels for MRI image quality and QA are expensive to obtain, often relying on multiple expert radiologists or physicists, and can remain subjective even with structured scoring schemes. Reviews of AI in MRI emphasize that segmentation and registration models, which often form part of QA or quantitative pipelines, suffer from limited training data in many clinically important settings (e.g., post‑operative imaging, less common tumor types).
Bias in training data can also propagate into QA systems: one study in prostate MRI found that a deep‑learning image‑quality score was inadvertently influenced by the presence of clinically significant cancer, illustrating how content and quality can become entangled in complex ways.
A label is the correct answer attached to a training image for the AI. In MRI, this could be an expert marking a scan as “diagnostically acceptable” or “degraded by motion.”
For AI‑based QA to be trusted in MRI, radiologists and physicists need to understand why a scan was labeled “poor quality” and what should happen next. Black‑box scores without visual explanations or metric breakdowns are difficult to reconcile with existing QA guidelines and accreditation requirements.
Regulatory and implementation perspectives also matter, prospective deployments of AI in MRI remain relatively rare compared with retrospective studies, and rigorous external validation across multiple sites and vendors is still the exception rather than the rule. Reviews consistently list limited generalizability, lack of external validation, and insufficient explainability as core barriers to broad adoption of AI in CT/MRI QA.
Taken together, these challenges do not undermine the promise of AI‑based QA for MRI, but they do underline that careful dataset curation, standardized protocols, and human oversight will be essential.
Once phantom and clinical images are scored automatically and stored centrally, QA changes character. Instead of isolated test results, departments gain access to longitudinal time series of noise, spatial resolution, uniformity, artifact scores, and protocol parameters for every scanner and modality.
On top of these time series, simple statistical models or more advanced machine‑learning approaches can detect subtle drifts, a slow loss of SNR, a creeping increase in geometric distortion, or worsening low‑contrast detectability, and forecast when tolerance limits are likely to be breached. This enables predictive maintenance: service visits can be scheduled before failures disrupt workflows, and physicists can intervene before image quality degrades enough to impact diagnosis.
In such a scenario, human experts move from manual measurement and box‑ticking toward higher‑level tasks: interpreting trends, refining protocols, and deciding how best to balance throughput, dose or contrast usage, and diagnostic performance.
Despite the impressive progress of AI in QA for CT, MRI, and mammography, the goal is not, and should not be, to replace medical physicists. Instead, the most productive model is one where AI handles the repetitive, high‑volume tasks - scoring phantoms, screening routine studies for obvious quality issues, tracking thousands of metrics over time - and humans concentrate on designing robust QA programs, investigating anomalies, and making clinically informed decisions.
To reach that equilibrium, several principles are key:
If those conditions are met, AI can indeed function as an invisible medical physicist - one that never gets tired of counting fibers, measuring noise, or scrolling through quality plots -allowing human experts to focus on what they do best: designing smarter imaging, protecting patients, and pushing the field forward.
Sources:
https://pmc.ncbi.nlm.nih.gov/articles/PMC10039170/