Unveiling the Superiority of Regularised Bayesian Piecewise Survival Models for Customer Churn Predictions
By Sofia Maria Karadimitriou, Jerome Carayol, Armand Valsesia
In the realm of customer churn analysis, organizations face the challenge of understanding and predicting customer attrition. While classification models have traditionally been utilized for churn prediction, survival models present a more advanced approach. They provide insights not only into whether customers will churn, but also when they are likely to do so. This article aims to illuminate the reasons why survival models are favoured for customer churn analysis over classification models. We will explore the key differences between these two approaches and highlight the unique benefits that survival models bring.
Asking the Right Questions
Classification models are designed to answer a straightforward binary question: will a customer churn at a specific time point or not? On the other hand, survival models extend beyond this binary outcome. They provide churn probability predictions over different time intervals. This temporal perspective equips businesses to act proactively and efficiently but also to allocate resources effectively based on the predicted churn timings. This enables them to customize the user experience according to potential churn timings. For instance, users expected to churn in the short term would be presented short-term challenges, while users more likely to churn in a more extended time period would benefit from a more refined journey. While this is not the primary strength, it’s worth noting that any churn system enables the targeting of customers at a higher risk at any given time.
When Censoring is Essential
Censoring refers to situations where the observation of churn is incomplete, either due to the limited duration of historical data or because customers continue to engage with the business. In customer churn analysis, it is crucial to design survival models that address right-censoring, making them superior to classification models. By accounting for right-censored events, i.e., customers who remain engaged or have not churned by the end of observation period, survival models yield accurate predictions while considering the inherent uncertainty in customer behaviour.
One Model to Rule Them All
With classification models, predicting churn at multiple time points necessitates constructing individual models for each point in time . For example, if a business aims to predict daily churn for a span of 30 days, they will have to train and evaluate an equal number of models. This method is resource-intensive in terms of both time and resources. Conversely, a single survival model facilitates churn prediction across various time points, drastically reducing time and effort. As a result, businesses can generate predictions for multiple time horizons within a few hours, offering a solution that is not only efficient, but also simplifies the deployment and maintenance process.
Putting the Pieces Together
A Bayesian piecewise survival model () enhanced with horseshoe regularisation () on coefficients is a sophisticated approach employed to analyse survival data and make predictions about the timing of events, such as customer churn. Let us delve into the mechanics of this model to grasp its functionality.
The Bayesian piecewise survival model partitions the observation period into distinct segments or intervals, each possessing its own hazard rate. This methodology enables the model to capture potential changes in the risk associated to the event of interest over time. Furthermore, as it is equivalent to a Poisson log-linear model (), it grants added flexibility to the modelling framework and simplifies interpretation. Also, unlike frequentist or ML approaches, Bayesian approaches enable us to incorporate prior knowledge and adjust our beliefs as data accrues, resulting in models that are both more versatile and easier to interpret. Finally, Bayesian models also provide a valuable advantage which is the capability to quantitatively assess and articulate uncertainty via posterior distributions. This means that by employing Bayesian techniques it is ensured to own a more profound grasp of the uncertainty surrounding results, enhancing the decision-making process, and strengthening the reliability of conclusions drawn from data.
Regularisation techniques ward off overfitting and enhance the model’s ability to generalise to unfamiliar data. Horseshoe regularisation is a Bayesian shrinkage technique that encourages sparsity in the model’s coefficients. This allows both shrinkage (reducing the impact of irrelevant variables) and selection (identifying relevant variables).
Variational Inference (VI) is advantageous for estimating complex Bayesian models when compared to the Markov Chain Monte Carlo (MCMC) algorithm due to several reasons.
- Computationally faster and more scalable, making it apt for large datasets.
- Yields deterministic outcomes, facilitating straightforward interpretation and informed decision-making.
- Flexible in terms of model specification and customisation, accommodating various priors and regularisation techniques.
- Easier to implement than MCMC, making it more accessible to a wider range of practitioners.
- Strikes an optimal balance between computational efficiency and accuracy, making it a preferred choice for large-scale problems.
Once the model is estimated, posterior distributions of the model parameters are obtained. By utilising these distributions, one cannot only derive personalised point estimates but also credible intervals for both the coefficients and hazard rates.
The Bayesian piecewise survival model with horseshoe regularisation offers several advantages. Firstly, it provides greater flexibility in capturing the changing hazard rates over time, resulting in a more realistic portrayal of the survival process. Secondly, the horseshoe regularisation encourages sparsity, effectively pinpointing the most relevant features contributing to the survival outcome while shrinking the influence of irrelevant variables. This leads to a more interpretable model and reduces the risk of overfitting.
Gotta Catch ‘Em All
This class of survival models demonstrates significant efficacy in recall, particularly as time progresses, in contrast to classification models. This is due to survival models being able to capture time-dependent information, such as the duration until an event occurs (e.g., customer churn). As time advances, survival models consistently accumulate survival information, resulting in higher recall rates. Classification models yield a binary prediction at a single time point, lacking the temporal nuance inherent in survival modelling. This temporal perspective enhances the model’s ability to detect churn that may have been missed by a static classification model, which only provides a momentary prediction.
MLOps Friendliness – No Pain More Gain
A single well-constructed survival model streamlines MLOps by replacing the need for multiple classification models. It achieves comparable or superior performance while refining deployment, maintenance, and resource allocation. Overseeing a single model is inherently scalable, facilitating version control, replication, and deployment across varied environments. This minimises inconsistencies and ensures reliable performance. Thus, adopting a well-structured survival model introduces simplicity, efficiency, scalability, and generalisation benefits to MLOps, refines procedures, enhances operational productivity, and simplifies monitoring and revisions.
- Survival models signify a big leap in customer churn prediction.
- These models provide a comprehensive, time-dependent perspective surpassing the limitations of classification models.
- Not only can survival models reveal the timing of churn events, but they also facilitate proactive retention strategies, resource optimization, and they amplify customer satisfaction.
- Utilising the Bayesian piecewise survival model with horseshoe regularization, combined with variational inference, unlocks the full potential of survival modeling, and revolutionizes their methodology to customer churn prediction.
- Adopting this approach can unleash the full potential of survival modeling and revolutionize customer churn prediction.
- Proactively addressing churn empowers businesses to bolster growth, foster long-term customer relationships, and thrive in a competitive landscape.
- By leveraging the advantages of a single survival model, businesses can refine their MLOps operations, boot efficiency, and achieve better results.
We are excited to announce that we will be releasing a new implementation soon. Please, stay tuned for more details on how it can benefit you!
 Lázaro, Elena, Carmen Armero, and Danilo Alvares. “Bayesian regularization for flexible baseline hazard functions in Cox survival models.” Biometrical Journal 63.1 (2021): 7-26.
 Carvalho, Carlos M., Nicholas G. Polson, and James G. Scott. “Handling sparsity via the horseshoe.” Artificial intelligence and statistics. PMLR, 2009.
 Holford, Theodore R. “The analysis of rates and of survivorship using log-linear models.” Biometrics (1980): 299-305.
- Better Churn Prediction — Using Survival Analysis by Iyar Lin
- Modelling Customer Churn With Survival Analysis by Zach Angell
- Customer Churn Prediction using Survival Analysis by Sarit Maitra
- Survival Regression Analysis on Customer Churn by Thomas J.Fan
- Survival Analysis: Predict Time-To-Event With Machine Learning (Part I) by Lina Faik