Developing Robust Testing Pipelines for Artificial Intelligence Algorithms in Clinical Decision-Making Systems

Clara Luana

Authors

Clara Luana Author

Keywords:

Artificial Intelligence, Clinical Decision-Making, Testing Pipeline, Model Validation, Healthcare AI, Interpretability, Risk Assessment, Deployment Strategy

Abstract

Artificial Intelligence (AI) is increasingly being deployed in clinical decision-making (CDM) systems, where reliability and accuracy are paramount. However, the lack of standardized and rigorous testing pipelines remains a significant barrier to safe and effective implementation. This paper proposes a structured and adaptable testing pipeline tailored to AI models in healthcare, emphasizing risk mitigation, interpretability, and clinical relevance. By synthesizing insights from prior studies and incorporating modular testing stages, the proposed framework enhances both pre-deployment validation and post-deployment monitoring. We also review existing AI validation techniques in clinical contexts and present a process-driven architecture using visual diagrams and performance evaluation metrics.

References

(1) Rajkomar, A. et al. “Machine learning in medicine.” Nature Medicine, vol. 26, no. 3, 2020, pp. 410-418.

(2) Kavuri, S. (2024). Shift-Left and Shift-Right Testing Approaches: A Practical Roadmap for Continuous Quality in Agile and DevOps. Journal of Information Systems Engineering and Management, 9(4), 1–10.

(3) Kelly, C. J. et al. “Key challenges for delivering clinical impact with artificial intelligence.” The Lancet Digital Health, vol. 1, no. 4, 2019, pp. e226-e236.

(4) Johnson, A. E. et al. “Reproducibility in medical AI.” JAMA Network Open, vol. 4, no. 7, 2021, pp. e2114743.

(5) Esteva, A. et al. “Dermatologist-level classification of skin cancer with deep neural networks.” Nature, vol. 542, no. 7639, 2017, pp. 115-118.

(6) Topol, E. “High-performance medicine: the convergence of human and artificial intelligence.” Nature Medicine, vol. 25, no. 1, 2019, pp. 44-56.

(7) Yu, K. H. et al. “Artificial intelligence in healthcare.” Nature Biomedical Engineering, vol. 2, no. 10, 2018, pp. 719-731.

(8) Kavuri, S. (2024). Probabilistic generative modeling for synthesizing high-coverage test data in safety-critical software applications. Computer Fraud & Security, 2024(12), 633–642.

(9) Jiang, F. et al. “Artificial intelligence in healthcare: past, present and future.” Stroke and Vascular Neurology, vol. 2, no. 4, 2017, pp. 230-243.

(10) Oakden-Rayner, L. “Exploring large-scale public medical image datasets.” Radiology, vol. 295, no. 2, 2020, pp. 291-297.

(11) Chen, I. Y. et al. “Why is my classifier discriminatory?” NeurIPS, vol. 31, no. 1, 2018, pp. 3534-3543.

(12) Beede, E. et al. “A human-centered evaluation of a deep learning system deployed in clinics.” CHI Conference Proceedings, vol. 1, 2020, pp. 1-13.

(13) Sendak, M. P. et al. “The human body is a black box.” NPJ Digital Medicine, vol. 3, no. 1, 2020, p. 71.

(14) Panch, T. et al. “The ‘inconvenient truth’ about AI in healthcare.” NPJ Digital Medicine, vol. 2, no. 1, 2019, p. 8.

(15) London, A. J. “Artificial intelligence and black-box medical decisions.” Hastings Center Report, vol. 49, no. 1, 2019, pp. 2-10.

(16) Kavuri, S. (2024). Data-Driven QA: Leveraging Metrics, Dashboards, and Analytics for Smarter Decision-Making in Software Testing. Journal of Information Systems Engineering and Management, 9(3), 1–10.

(17) Ribeiro, M. T. et al. “Model agnostic interpretability of machine learning.” KDD Proceedings, vol. 23, no. 1, 2016, pp. 1641-1650.

(18) Obermeyer, Z. et al. “Dissecting racial bias in healthcare algorithms.” Science, vol. 366, no. 6464, 2019, pp. 447-453.

Developing Robust Testing Pipelines for Artificial Intelligence Algorithms in Clinical Decision-Making Systems

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Indexing