Tools

Ryan Turlip1, Jonathan H. Sussman1, Jaskeerat Gujral1, Felix C. Oettl2, Irina-Mihaela Matache3, Bhargavi R. Budihal4, Ali K. Ozturk1, Jang W. Yoon1, William C. Welch1, Mert Marcel Dagli1
  1. Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
  2. Department of Orthopedic Surgery, Balgrist University Hospital, University of Zurich, Zürich, Switzerland
  3. Department of Physiology, Faculty of Medicine, Carol Davila University of Medicine and Pharmacy, Bucharest, Romania
  4. Department of General Medicine, BGS Global Institute of Medical Sciences, Bengaluru, Karnataka, India

Correspondence Address:
Mert Marcel Dagli, Department of Neurosurgery, University of Pennsylvania, Perelman School of Medicine, Philadelphia, United States.

DOI:10.25259/SNI_178_2025

Copyright: © 2025 Surgical Neurology International This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-Share Alike 4.0 License, which allows others to remix, transform, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms.

How to cite this article: Turlip R1, Sussman JH1, Gujral J1, Oettl FC2, Matache I3, Budihal BR4, Ozturk AK1, Yoon JW1, Welch WC1, Dagli MM1. Toward transparency: Implications and future directions of artificial intelligence prediction model reporting in healthcare. Surg Neurol Int 11-Apr-2025;16:135

How to cite this URL: Turlip R1, Sussman JH1, Gujral J1, Oettl FC2, Matache I3, Budihal BR4, Ozturk AK1, Yoon JW1, Welch WC1, Dagli MM1. Toward transparency: Implications and future directions of artificial intelligence prediction model reporting in healthcare. Surg Neurol Int 11-Apr-2025;16:135. Available from: https://surgicalneurologyint.com/?post_type=surgicalint_articles&p=13492

Date of Submission
19-Feb-2025

Date of Acceptance
19-Mar-2025

Date of Web Publication
11-Apr-2025

INTRODUCTION

The integration of artificial intelligence (AI) within healthcare represents a transformative paradigm shift, ushering in an era of unprecedented progress in healthcare decision-making and data-driven analytics to improve patient outcomes. This advancement is significantly driven by machine learning (ML), a subset of AI where algorithms learn from data to develop predictions.[ 4 ] Traditional ML statistical models, such as regression, predict outcomes by discerning relationships between independent predictor variables and dependent outcome variables of interest while adjusting for confounders. ML offers many additional models, such as Support Vector Machines, Ensemble-Based Methods, and Artificial Neural Networks.[ 4 ] These models enable more accurate predictions across a wider range of datasets while requiring fewer assumptions about the underlying distribution and characteristics of the data.

In healthcare, the shift toward more complex ML algorithms for nuanced datasets holds promise in enhancing the predictive capabilities for patient outcomes. However, the rapid adoption of AI prediction models has outpaced the development of proper clinical and research guidelines, raising concerns about reliability, validity, reproducibility, data security, and potential biases. Addressing these challenges is crucial to ensure the effective and trustworthy integration of AI tools into clinical practice. Hence, in this perspective, we highlight the unique implications and challenges posed by AI prediction models and explore future directions for reporting guidelines tailored to AI in healthcare.

IMPLICATIONS AND CHALLENGES OF AI PREDICTION MODELS

AI prediction models have unique features such as opaqueness, validation frameworks, and clinical applicability that make creating clinical and research guidelines challenging.[ 5 , 6 ] Opaqueness, often referred to as the “black box” problem, arises when the decision-making process of AI algorithms is not transparent or interpretable to users.[ 4 , 5 ] For example, while a linear regression model can be easily conceptualized as fitting points to a line, neural network models involve multiple layers of mathematical formulas that do not result in any tangible shape that can be visually understood. Thus, the complexity of how these models derive their predictions can hinder clinicians’ ability to trust and effectively integrate AI recommendations into patient care. Addressing opaqueness is crucial for ensuring that AI tools are not only technically sophisticated but also clinically relevant and understandable. Mitigating this challenge involves creating interpretable ML models within the data analytics pipeline beyond merely showcasing the top-performing model. This approach advocates for presenting a range of models, from explainable to complex, where the simpler models offer insights into the significance of predictors, and the more complex models build on these conceptual principles to yield more accurate and generalizable predictions. Such insights from simpler models contribute to the scientific body of knowledge, even when a more complex model, due to its superior performance, might be preferred in practical applications.[ 3 ]

External validation frameworks are critical in assessing generalizability and reliability.[ 4 , 5 ] These frameworks test AI algorithms on independent data sets that were not used during the model’s training phase, offering a rigorous evaluation of the model’s performance in real-world scenarios. The importance of external validation lies in its ability to expose and mitigate overfitting, whereby a model performs exceptionally well on its training data but is insufficiently generalizable and performs poorly on unseen data. By employing these frameworks, researchers and clinicians can ensure that AI tools maintain their predictive accuracy and clinical relevance across diverse patient populations and healthcare settings. Despite the importance of external validation, such studies are rare and often hindered by inadequate reporting and data sharing at the time of publication and thereafter.

The clinical applicability of AI in healthcare hinges on the seamless integration of AI models into the existing healthcare infrastructure, ensuring that these technologies can be effectively utilized in real-world patient care settings. This entails not only the technical compatibility of AI systems with clinical workflows but also the models’ ability to produce actionable insights relevant to patient-specific conditions and treatment plans. Achieving clinical applicability requires rigorous testing and validation to confirm that AI tools are reliable, accurate and enhance decision-making processes. Furthermore, it necessitates collaboration between engineers, data scientists, and clinicians to tailor AI solutions to the nuanced demands of healthcare. For example, electronic health records have already started integrating natural language processing and AI prediction models into their systems to enhance patient care and clinical decision-making.[ 6 ]

FUTURE DIRECTIONS OF AI PREDICTION MODEL REPORTING IN HEALTHCARE

Although widely adopted, AI models lack standardization and rigorous reporting practices, compromising their reliability and validity. To address these deficiencies, the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis (TRIPOD) statement was introduced in 2015 and offered a 22-item checklist to aid in transparent reporting of prediction model development, validation, and updating.[ 1 ] In 2021, the TRIPOD statement then published a protocol for the development of an extended reporting guideline (TRIPOD+AI) and risk of bias tool for diagnostic and prognostic prediction model studies that applied ML techniques.[ 5 ] Acknowledging the pressing need for robust guidelines in the rapidly evolving landscape of AI in healthcare, the recently published TRIPOD+AI guidelines, now a 27-item checklist, represent a significant advancement in the standardization of AI prediction models in healthcare.[ 1 ]

Designed to ensure transparency, validity, and utility in AI prediction studies, TRIPOD+AI provides an extended structural framework that aids academic institutions, researchers, journal editors, peer reviewers, funders, patients, policymakers, medical device manufacturers, and healthcare professionals in evaluating AI prediction studies more rigorously.[ 2 ] It particularly stresses the importance of fairness, the promotion of open scientific practices, and the engagement of the public and patients in the research process, ensuring that these models serve a broad and diverse population effectively. The TRIPOD+AI framework ensures that the methods, parameters, and input data used to tune the AI algorithms and assess their performance (accuracy, sensitivity, specificity, etc.) are systematically documented. Importantly, it also aims to ensure that the outputs from the algorithms are interpreted correctly, with particular attention to the potential limitations and caveats. This includes key assumptions and requirements for the input data and patient characteristics as well as important details relevant to real-world AI applications, such as handling of poor-quality data, missing values, and outliers.

These guidelines are designed to be applicable across a wide range of healthcare settings, including public health, primary care, and nursing homes. They cater to both prognostic and diagnostic models, addressing the unique challenges posed by AI in healthcare and ensuring that the benefits of AI are accessible across various medical and patient contexts.[ 2 ] While the TRIPOD+AI guidelines did not specifically consider large language models, such as the increasingly prominent ChatGPT, during its initial development, focusing instead on non-generative models, their principles remain relevant and translatable and can substantially advance transparency in the development and evaluation of generative AI within healthcare.[ 2 , 3 ] To ensure these guidelines continue to be applicable amidst the rapid evolution of AI technologies, it will be imperative to periodically update them, considering the latest advancements, including those in generative modeling.

Despite the accessibility of the TRIPOD statement and evidence of improved reporting in a pre-post analysis, substantial deficiencies in reporting standards for multivariable prediction model studies remain prevalent and need to be acknowledged. As the TRIPOD+AI statement gains traction, however, its widespread adoption holds the promise of bolstering methodological standards in regression and AI prediction studies, thereby fostering greater reliability, reproducibility, and ultimately, improved patient outcomes.[ 2 ]

Disclaimer

The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the Journal or its management. The information contained in this article should not be considered to be medical advice; patients should consult their own physicians for advice as to their specific medical needs.

References

1. Collins GS, Dhiman P, Andaur Navarro CL, Ma J, Hooft L, Reitsma JB. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021. 11: e048008

2. Collins GS, Moons KG, Dhiman P, Riley RD, Beam AL, Van Calster B. TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024. 385: e078378

3. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMC Med. 2015. 13: 1

4. Dagli MM, Rajesh A, Asaad M, Butler CE. The use of artificial intelligence and machine learning in surgery: A comprehensive literature review. Am Surg. 2023. 89: 1980-8

5. Durán JM, Jongsma KR. Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI. J Med Ethics. 2021. p. medethics-2020-106820

6. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligenc. BMC Med. 2019. 17: 195

Leave a Reply

Your email address will not be published. Required fields are marked *