Abstract
Background Predicting 12-month mortality is important to optimise end-of-life care for people with Huntington’s disease (HD). Machine learning (ML) may outperform traditional statistical methods when modelling complex high-dimensional, censored survival data. We compared several tree-based ensemble ML methods for predicting 12-month mortality among patients with HD.
Methods We conducted a retrospective review of electronic records of adults with genetically-confirmed HD at an Australian centre (1 January 2018 - 30 June 2023). Three tree-based ML methods (random survival forest (rfsrc), ranger, Bayesian Additive Regression Trees (BART)) were compared to Cox proportional-hazards (CoxPH) model using area under the receiver operating characteristics curve (AUC). The covariates included patient demographics, functional status, symptoms, complications, comorbidities. Prior to model development, we employed the standard approach of splitting the dataset into training (70%) and validation (30%) sets. We used dimensionless Shapley value from cooperative Game Theory to determine the covariates making the largest contributions to death.
Results Among 343 patients, 71 (20.7%) died. The AUCs for training and validation sets, respectively, were 0.96 and 0.79 (rfsrc), 0.99 and 0.92 (ranger), 0.94 and 0.90 (BART), 0.93 and 0.75 (CoxPH). A smaller difference between training and validation AUC for BART model suggests less overfitting and more accurate predictions compared to other models. Shapley values for all ML models showed poor functional status (UHDRS-TFC<1) contributed the most to death within 12 months. For BART model, depression, male sex, dysphagia, dementia were additional important predictors.
Conclusion BART can accurately predict 12-month mortality in HD as compared to traditional Cox method.