Performance across different versions of an artificial intelligence model for screen-reading of mammograms.

Larsen, M. et al. Performance across different versions of an artificial intelligence model for screen-reading of mammograms.. European radiology (2026).

Publisher's Version

Abstract

OBJECTIVES: Studies have reported promising results regarding artificial intelligence (AI) as a tool for improved mammographic screening interpretive performance. We analyzed AI malignancy risk scores from two versions of the same commercial AI model.

MATERIALS AND METHODS: This retrospective cohort study used data from 117,709 screening examinations performed in BreastScreen Norway 2009-2018. The mammograms were processed by two versions of the commercially available AI model, Transpara (version 1.7 and 2.1). The distributions of exam-level risk scores (AI score 1-10) and risk categories were evaluated for both AI versions on all examinations, including 737 screen-detected and 200 interval cancers. Scores between 1-7 were categorized as low risk, 8-9 as intermediate risk, and 10 as high risk of malignancy.

RESULTS: Area under the receiver operating curve was 0.908 (95% CI: 0.986-0.920) for version 1.7 and 0.928 (95% CI: 0.917-0.939) for 2.1 when screen-detected and interval cancers were considered as positive cases (p < 0.001). A total of 87.1% (642/737) and 93.5% (689/737) of the screen-detected cancers had an AI score of 10 with version 1.7 and 2.1, respectively. Among interval cancers, 45.0% (90/200) had AI score 10 with version 1.7 and 44.5% (89/200) had AI score 10 with version 2.1.

CONCLUSION: A higher proportion of screen-detected breast cancers had the highest AI score of 10 with the newer version of the AI model compared to the older version. For interval cancers, there was no difference in the proportion of cases assigned to the highest score between the two versions.

KEY POINTS: Question Studies have reported promising results regarding the use of AI in mammography screening, but comparisons of updated versus older versions are less studied. Findings In our study, 87.1% (642/737) of the screen-detected cancers were classified with a high malignancy risk score by the old version, while it was 93.5% (689/737) for the newer version. Clinical relevance Understanding how version updates of AI models might impact screening mammography performance will be important for future quality assurance and validation of AI models.

Last updated on 01/14/2026
PubMed

Sheng Lab
Stanley Center for Psychiatric Research
Broad Institute of MIT & Harvard
75 Ames Street
Cambridge, MA 02142

shenglab

Performance across different versions of an artificial intelligence model for screen-reading of mammograms.

Abstract

Get In Touch