Obesity risk estimation using ensemble learning and synthetic data augmentation techniques
DOI:
https://doi.org/10.35335/1bg4ws75Keywords:
Augmentation, Classification, Ensemble Learning, Imbalanced Data, ObesityAbstract
Obesity has become a primary global health concern due to its strong association with various chronic diseases such as diabetes, cardiovascular disorders, and certain types of cancer. Accurate and early risk prediction of obesity is essential for effective prevention and intervention strategies. However, predictive modeling in this domain often encounters two critical challenges: the presence of imbalanced datasets and the complex, nonlinear nature of behavioral and anthropometric features. This study aims to address these challenges by developing a robust classification model that integrates ensemble learning with synthetic data augmentation techniques. The research utilizes the Obesity Dataset from Kaggle, which comprises 2,111 records labeled into seven obesity levels, reflecting a realistic class distribution imbalance. Preprocessing steps included data cleaning, encoding, and stratified splitting. To enhance class representation, two augmentation methods were applied: SMOTE for synthetic oversampling and Generative Adversarial Networks (GANs) for generating realistic minority samples. A stacking ensemble model was constructed using Random Forest and XGBoost as base learners, with Logistic Regression serving as the meta-learner. Hyperparameter optimization was conducted using both grid and randomized search methods. Evaluation metrics, including accuracy, precision, recall, and F1-score, were used to assess performance. The proposed model achieved a 91% accuracy and an F1-score of 0.89, significantly outperforming models from previous studies. These findings suggest that combining ensemble learning with hybrid augmentation strategies effectively addresses class imbalance and improves predictive reliability in obesity risk estimation. The developed model holds practical value as a decision-support tool for early screening and targeted intervention in obesity prevention programs.References
Ab Rahman, N. F., Wang, S. L., & Khalid, N. (2025). Ensemble Learning in Educational Data Analysis for Improved Prediction of Student Performance: a Literature Review. International Journal of Modern Education, 7(24), 887–902. https://doi.org/10.35631/ijmoe.724064
Değirmenci, A. (2025). Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach. Turkish Journal of Science and Technology, 20(1), 77–90. https://doi.org/10.55525/tjst.1572382
Delpino, F. M., Costa, Â. K., César do Nascimento, M., Dias Moura, H. S., Geremias dos Santos, H., Wichmann, R. M., Porto Chiavegatto Filho, A. D., Arcêncio, R. A., & Nunes, B. P. (2024). Does machine learning have a high performance to predict obesity among adults and older adults? A systematic review and meta-analysis. Nutrition, Metabolism and Cardiovascular Diseases, 34(9), 2034–2045. https://doi.org/10.1016/j.numecd.2024.05.020
Ganie, S. M., Reddy, B. B., K, H., & Rege, M. (2025). An investigation of ensemble learning techniques for obesity risk prediction using lifestyle data. Decision Analytics Journal, 14, 100539. https://doi.org/https://doi.org/10.1016/j.dajour.2024.100539
Hassan Mukhtar, M. A., Babiker Ahmed, A. U., Siddig Mohammed, M. A., Ibrahim Omer, N. O., Altom, D. S., & Elnour, M. A. A. (2025). The Role of Artificial Intelligence in the Prediction of Bariatric Surgery Complications: A Systematic Review. Cureus, 17(4), e82461. https://doi.org/10.7759/cureus.82461
Hayaeian Shirvan, M., Moattar, M. H., & Hosseinzadeh, M. (2025). Deep generative approaches for oversampling in imbalanced data classification problems: A comprehensive review and comparative analysis. Applied Soft Computing, 170, 112677. https://doi.org/https://doi.org/10.1016/j.asoc.2024.112677
Iqbal, M., Dharmawan, W. S., & Septian, R. (2024). Prediction of Obesity Categories Based on Physical Activity Using Machine Learning Algorithms. Journal of Computer Networks, Architecture and High Performance Computing, 6(3), 1025–1034. https://doi.org/10.47709/cnahpc.v6i3.4053
Ivan, J., & Prasetyo, S. Y. (2023). Heart Disease Prediction Using Ensemble Model and Hyperparameter Optimization. International Journal on Recent and Innovation Trends in Computing and Communication, 11(June), 290–295. https://doi.org/10.17762/ijritcc.v11i8s.7208
Khalid, A. R., Owoh, N., Uthmani, O., Ashawa, M., Osamor, J., & Adejoh, J. (2024). Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach. In Big Data and Cognitive Computing (Vol. 8, Issue 1). https://doi.org/10.3390/bdcc8010006
Khan, A. A., Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications, 244. https://doi.org/10.1016/j.eswa.2023.122778
Khater, T., Tawfik, H., & Singh, B. (2024). Explainable artificial intelligence for investigating the effect of lifestyle factors on obesity. Intelligent Systems with Applications, 23(July), 200427. https://doi.org/10.1016/j.iswa.2024.200427
Kundu, S., Biswas, S. K., Tripathi, D., Karmakar, R., Majumdar, S., & Mandal, S. (2023). A review on rainfall forecasting using ensemble learning techniques. E-Prime - Advances in Electrical Engineering, Electronics and Energy, 6, 100296. https://doi.org/https://doi.org/10.1016/j.prime.2023.100296
Lin, W., Shi, S., Huang, H., Wen, J., & Chen, G. (2023). Predicting risk of obesity in overweight adults using interpretable machine learning algorithms. Frontiers in Endocrinology, 14(November), 1–10. https://doi.org/10.3389/fendo.2023.1292167
Mahajan, P., Uddin, S., Hajati, F., & Moni, M. A. (2023). Ensemble Learning for Disease Prediction: A Review. In Healthcare (Vol. 11, Issue 12). https://doi.org/10.3390/healthcare11121808
Mamun, M., Farjana, A., Mamun, M. Al, & Ahammed, M. S. (2022). Lung cancer prediction model using ensemble learning techniques and a systematic review analysis. 2022 IEEE World AI IoT Congress (AIIoT), 187–193. https://doi.org/10.1109/AIIoT54504.2022.9817326
Maulana, A., Afidh, R. P. F., Maulydia, N. B., Idroes, G. M., & Rahimah, S. (2024). Predicting Obesity Levels with High Accuracy: Insights from a CatBoost Machine Learning Model. Infolitika Journal of Data Science, 2(1), 17–27. https://doi.org/10.60084/ijds.v2i1.195
Niakan Kalhori, S. R., Najafi, F., Hasannejadasl, H., & Heydari, S. (2025). Artificial intelligence-enabled obesity prediction: A systematic review of cohort data analysis. International Journal of Medical Informatics, 196, 105804. https://doi.org/https://doi.org/10.1016/j.ijmedinf.2025.105804
Rahaman, M. A., Idris, R. M., Zuveriya, S., & Sultana, N. (2024). Enhancing Diabetes Mellitus Onset Prediction through Advanced Ensemble Learning Techniques. 6(2), 11–28. https://doi.org/10.22452/josma.vol6no2.2
Rahman, A., & Tasnim, S. (2014). Ensemble Classifiers and Their Applications: A Review. International Journal of Computer Trends and Technology, 10(1), 31–35. https://doi.org/10.14445/22312803/ijctt-v10p107
Santos, K. C., Miani, R. S., & de Oliveira Silva, F. (2024). Evaluating the Impact of Data Preprocessing Techniques on the Performance of Intrusion Detection Systems. Journal of Network and Systems Management, 32(2), 36. https://doi.org/10.1007/s10922-024-09813-z
Singh, A., & Bobde, S. (2025). A Review on Predicting Failure in Industrial Machines: Its Methods, Challenges and Future Direction. 2025 1st International Conference on AIML-Applications for Engineering & Technology (ICAET), 1–6. https://doi.org/10.1109/ICAET63349.2025.10932305
Tech, A. K. (2020). Ensemble Learning for Classification - A Survey. https://consensus.app/papers/ensemble-learning-for-classification-a-survey-tech/cc6d2ff563c251a2b02cc781c3391814/
Wadghiri, M. Z., Idri, A., El Idrissi, T., & Hakkoum, H. (2022). Ensemble blood glucose prediction in diabetes mellitus: A review. Computers in Biology and Medicine, 147, 105674. https://doi.org/https://doi.org/10.1016/j.compbiomed.2022.105674
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Nur Tulus Ujianto, Gunawan Gunawan, Wresti Andriani, Ivan Rizky Ramadhani, Nasichatun Nasichatun (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
