An end-to-end data science project analysing telecom customer churn — from SQL business queries to a deployed XGBoost model served via FastAPI.
From raw data to a production prediction system, every layer is documented and reproducible.
Findings from exploratory analysis and SQL queries across 7,043 telecom customers.
Models tuned with RandomizedSearchCV and evaluated on ROC-AUC — the primary metric for ranking churn risk.
| Model | Accuracy | Precision | Recall | F1 Score | ROC-AUC | Status |
|---|---|---|---|---|---|---|
| XGBoost (Tuned) | 75.87% | 46.96% | 84.76% | 60.44% | 84.47% | Production |
| Random Forest (Tuned) | 77.00% | 55.30% | 69.79% | 61.70% | 84.12% | Runner-up |
| Logistic Regression | 73.81% | 50.42% | 79.68% | 61.76% | 83.99% | Baseline |
| LightGBM (Tuned) | 75.94% | 53.41% | 73.26% | 61.78% | 83.54% | Compared |
| SVM | 74.24% | 50.98% | 76.20% | 61.09% | 81.66% | Compared |
XGBoost model wrapped as a production REST API with Swagger documentation, input validation via Pydantic, and batch prediction support.
{
"churn_prediction": true,
"churn_probability": 0.8968,
"churn_probability_pct": "89.7%",
"risk_level": "High",
"recommendation": "Immediate
retention action required.",
"model_used": "XGBoost (Tuned)"
}
Key findings from SQL analysis mapped to actionable retention strategies.
Every layer of the project uses industry-standard tools used in production data science environments.