Data Science · End-to-End Project

Customer Churn
Analysis &
Prediction

An end-to-end data science project analysing telecom customer churn — from SQL business queries to a deployed XGBoost model served via FastAPI.

26.5%
Churn Rate
$139k
Monthly Revenue at Risk
84.5%
Model ROC-AUC
84.8%
Model Recall

8-Week End-to-End Workflow

From raw data to a production prediction system, every layer is documented and reproducible.

WEEK 01 — 02
Data & SQL
Data cleaning pipeline, 10-section EDA, and 10 SQL business queries against SQLite.
Pandas · SQLite · Seaborn
WEEK 03 — 04
Dashboard
Interactive Streamlit dashboard with filters, KPI cards, and insight banners. Deployed on Streamlit Cloud.
Streamlit · Plotly
WEEK 05 — 06
ML Modelling
5-model comparison, hyperparameter tuning via RandomizedSearchCV, SHAP explainability.
XGBoost · SHAP · Sklearn
WEEK 07 — 08
API & Predictor
FastAPI REST API with Swagger docs. Live prediction UI integrated into Streamlit dashboard.
FastAPI · Uvicorn · Pydantic

Key Results

Findings from exploratory analysis and SQL queries across 7,043 telecom customers.

Overall Churn Rate
26.5%
1,869 of 7,043 customers churned
Revenue at Risk
$139,131
Monthly revenue from churned customers
Month-to-Month Churn
42.7%
3x higher than annual plan customers
New Customer Churn
47.4%
Customers in first 12 months
Electronic Check Churn
45.3%
vs 15% for auto-pay customers
Avg Charge — Churned
$74.44
$13.17 higher than retained customers

5-Model Comparison

Models tuned with RandomizedSearchCV and evaluated on ROC-AUC — the primary metric for ranking churn risk.

Model Accuracy Precision Recall F1 Score ROC-AUC Status
XGBoost (Tuned) 75.87% 46.96% 84.76% 60.44% 84.47% Production
Random Forest (Tuned) 77.00% 55.30% 69.79% 61.70% 84.12% Runner-up
Logistic Regression 73.81% 50.42% 79.68% 61.76% 83.99% Baseline
LightGBM (Tuned) 75.94% 53.41% 73.26% 61.78% 83.54% Compared
SVM 74.24% 50.98% 76.20% 61.09% 81.66% Compared
Selection Rationale — ROC-AUC was chosen as the primary metric because the retention team contacts a fixed number of customers per month. The model's ability to rank customers by churn risk determines campaign ROI directly. XGBoost (Tuned) achieved the highest ROC-AUC of 84.47% with a recall of 84.76%, meaning it correctly identifies 85 out of every 100 churners before they leave.

FastAPI Prediction Service

XGBoost model wrapped as a production REST API with Swagger documentation, input validation via Pydantic, and batch prediction support.

Endpoints
GET / Health check
GET /model-info Model metadata
POST /predict Single prediction
POST /predict-batch Up to 100 customers
Sample Response
{
  "churn_prediction": true,
  "churn_probability": 0.8968,
  "churn_probability_pct": "89.7%",
  "risk_level": "High",
  "recommendation": "Immediate
  retention action required.",
  "model_used": "XGBoost (Tuned)"
}

High-Risk Segments

Key findings from SQL analysis mapped to actionable retention strategies.

42.7%
Month-to-Month Contracts
Offer first-month discounts or loyalty credits to move customers to annual plans. Churn drops to 11% on one-year contracts.
47.4%
First 12 Months (New Customers)
Implement a structured onboarding programme with proactive check-ins at months 1, 3, and 6 to reduce early churn.
41.0%
Fiber Optic Without Tech Support
Proactively upsell Tech Support to Fiber Optic customers. Churn drops from 41% to 18% with Tech Support — the highest ROI retention action.
45.3%
Electronic Check Payment
Offer a 5% bill discount to switch to automatic payment. Electronic check users churn at 3x the rate of credit card auto-pay customers.

Tech Stack

Every layer of the project uses industry-standard tools used in production data science environments.

Data & Analysis
Python 3.11 Pandas NumPy SQLite SQLAlchemy
Visualisation
Plotly Seaborn Matplotlib Streamlit
Machine Learning
XGBoost LightGBM Scikit-learn SHAP
API & Deployment
FastAPI Uvicorn Pydantic Streamlit Cloud
Tuning & Validation
RandomizedSearchCV StratifiedKFold Learning Curves
Version Control
Git GitHub GitHub Pages