Tuesday, December 26, 2017


Concept Note on "Churn Prediction"

Churn prediction Analytics is most often used in prediction of disengagement of customers from the existing pool in subscription or membership based businesses like Telecom, Banking and Retail. Churn Prediction model aids in identifying customers who are likely to churn in advance so as to devise strategies to retain them.

Telecom Scenario

In telecom Churn prediction, while selecting the data for analyses, depending on the product of interest, corresponding input variables from usage details, revenue and complaints/call centre data are included. Data is collected from repository corresponding to any of the major lines of business like prepaid, postpaid, data or from collaboration tools as business interests may be inclined. The main output in churn solution covers the Churn Probability Score for customers under observation.

Components of Churn Prediction: A solution in churn prediction typically covers the following components as shown in the Architecture diagram attached: 
1. Data pre-processing
2. Exploratory data analysis to look at segments and correlated variables
3. Dividing data into training, test and validation.
4. Use of models like logistic regression, random forests, decision trees or neural networks for Churn Prediction. This can be done by any commonly used statistical program like SAS or R.
5. Validating and refining model using confusion matrix, ROC curves etc.
6. Presentation of approach, outcomes and recommendations in a business like format.

The basic variables (as given in raw data) are to be considered and the other variables having a significant impact on churn are derived using business logic. The key variable categories considered are Demographic, Payment, Usage, and Complaint. Several churn prediction models are usually built using different machine learning techniques and statistical modeling techniques and the best model is selected based on lift value and model prediction accuracy.
The Accuracy of the model/s which is trained on available data (like logistic regression, random forest, neural networks or trees) is tested on test data through confusion matrix, ROC curves and errors typical to the model used. The model is further fine-tuned as required.
Scores for churn management are built using predict function on the model that has been trained, tested and fine-tuned. The Probability scores correspond to the predicted probability that a subscriber will churn or not. The churn/retention manager can tune the output of this report by specifying a threshold churn risk score, such that only high-risk customers above this score are returned in the report
The Churn module is often integrated with the customer segmentation module to provide end to end visibility of the data. 

Written By:
Abhinav Singh
Lead Consultant
Wipro Analytics