Privacy-Preserving Machine Learning (PPML) Inference for Clinically Actionable Models

Balaban, Barış; Mağara, Şeyma Selcan; Yucekul, Altug; obeid, ibrahim; Pizones, Javier; Pellise, Ferran; Yilgor, Caglar

dc.contributor	Vall d'Hebron Barcelona Hospital Campus
dc.contributor.author	Balaban, Barış
dc.contributor.author	Mağara, Şeyma Selcan
dc.contributor.author	Yucekul, Altug
dc.contributor.author	obeid, ibrahim
dc.contributor.author	Pizones, Javier
dc.contributor.author	Pellise, Ferran
dc.contributor.author	Yilgor, Caglar
dc.date.accessioned	2025-04-11T06:51:48Z
dc.date.available	2025-04-11T06:51:48Z
dc.date.issued	2025
dc.identifier.citation	Balaban B, Magara SS, Yilgor C, Yucekul A, Obeid I, Pizones J, et al. Privacy-Preserving Machine Learning (PPML) Inference for Clinically Actionable Models. IEEE Access. 2025;13:37431–56.
dc.identifier.issn	2169-3536
dc.identifier.uri	http://hdl.handle.net/11351/12938
dc.description	Homomorphic encryption; Machine learning; XGBoost
dc.description.abstract	Machine learning (ML) refers to algorithms (often models) that are learned directly from data, germane to past experience. As algorithms have constantly been evolving with the exponential increase of computing power and vastly generated data, privacy of algorithms as well as of data becomes extremely important due to regulations and IP rights. Therefore, it is vital to address privacy and security concerns of both data and model together with other performance metrics when commercializing machine learning models. Our aim is to show that privacy-preserving machine learning inference methods can safeguard the intellectual property of models and prevent plaintext models from disclosing information about the sensitive data employed in training these ML models. Additionally, these methods protect the confidentiality of model users’ sensitive patient data. We accomplish this by performing a security analysis to determine an appropriate query limit for each user, using the European Spine Study Group’s (ESSG) adult spinal deformity dataset. We implement a privacy-preserving tree-based machine learning inference and run two security scenarios (scenario A and scenario B) containing four parts with progressively increasing the number of synthetic data points, which are used to enhance the accuracy of the attacker’s substitute model. A target model is generated with particular operation site(s) in each scenario, and substitute models are built with nine-time threefold cross-validation using the XGBoost algorithm with the remaining sites’ data to assess the security of the target model. First, we create box plots of the test sets’ accuracy, sensitivity, precision, and F-score metrics to compare the substitute models’ performance with the target model. Second, we compare the gain values of the target and substitute models’ features. Third, we provide an in-depth analysis to check the inclusion of target model split points in substitute models with a heatmap. Finally, we compare the outputs of public and privacy-preserving models and report intermediate timing results. The privacy-preserving XGBoost model results are identical to the original plaintext model in the aforementioned two scenarios in terms of prediction accuracy. The differences between performance metrics of best-performing substitute models and target models are 0.27, 0.18, 0.25, 0.26 for scenario A, and 0.04, 0, 0.04, and 0.03 for scenario B for accuracy, sensitivity, precision, and F-score, respectively. The differences between target model accuracy and the mean accuracy values of models in each scenario on the substitute models’ test dataset are 0.38 for scenario A and 0.14 for scenario B. Based on our findings, we conclude that machine learning models (i.e., our target models) may contribute to the advancement in the field of application where they are deployed. Ensuring the security of both the model and the user data enables the protection of the intellectual property of ML models, preventing the leakage of sensitive information used in training and model users’ data.
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers
dc.relation.ispartofseries	IEEE Access;13
dc.rights	Attribution 4.0 International
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.source	Scientia
dc.subject	Aprenentatge automàtic
dc.subject	Propietat intel·lectual
dc.subject	Bases de dades - Seguretat
dc.subject	Protecció de dades
dc.subject.mesh	Machine Learning
dc.subject.mesh	Intellectual Property
dc.subject.mesh	Confidentiality
dc.title	Privacy-Preserving Machine Learning (PPML) Inference for Clinically Actionable Models
dc.type	info:eu-repo/semantics/article
dc.identifier.doi	10.1109/ACCESS.2025.3540261
dc.subject.decs	aprendizaje automático
dc.subject.decs	propiedad intelectual
dc.subject.decs	confidencialidad
dc.relation.publishversion	https://doi.org/10.1109/ACCESS.2025.3540261
dc.type.version	info:eu-repo/semantics/publishedVersion
dc.audience	Professionals
dc.contributor.organismes	Institut Català de la Salut
dc.contributor.authoraffiliation	[Balaban B] Department of Biostatistics and Bioinformatics, Institute of Health Sciences, Acıbadem Mehmet Ali Aydınlar University, Istanbul, Türkiye. [Magara SS] Department of Computer Science and Engineering, Sabancı University, Istanbul, Türkiye. Department of Computer Science, University of Tübingen, Tübingen, Germany. [Yilgor C, Yucekul A] Department of Orthopedics and Traumatology, Acibadem University School of Medicine, Istanbul, Türkiye. [Obeid I] Clinique du Dos, Elsan Jean Villar Private Hospital, Bordeaux, France. [Pizones J] Spine Surgery Unit, Hospital Universitario La Paz, Madrid, Spain. [Pellisé F] Unitat de Recerca de la Columna Vertebral, Vall d’Hebron Hospital Universitari, Barcelona, Spain
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/101079319
dc.rights.accessrights	info:eu-repo/semantics/openAccess