Obesity risk estimation using ensemble learning and synthetic data augmentation techniques

Authors

  • Nur Tulus Ujianto Universitas Pancasakti Tegal, Kota Tegal, Indonesia Author
  • Gunawan Gunawan Universitas Pancasakti Tegal, Kota Tegal, Indonesia Author https://orcid.org/0009-0004-3132-3854
  • Wresti Andriani Universitas Pancasakti Tegal, Kota Tegal, Indonesia Author
  • Ivan Rizky Ramadhani Universitas Pancasakti Tegal, Kota Tegal, Indonesia Author
  • Nasichatun Nasichatun Universitas Pancasakti Tegal, Kota Tegal, Indonesia Author

DOI:

https://doi.org/10.35335/1bg4ws75

Keywords:

Augmentation, Classification, Ensemble Learning, Imbalanced Data, Obesity

Abstract

Obesity has become a primary global health concern due to its strong association with various chronic diseases such as diabetes, cardiovascular disorders, and certain types of cancer. Accurate and early risk prediction of obesity is essential for effective prevention and intervention strategies. However, predictive modeling in this domain often encounters two critical challenges: the presence of imbalanced datasets and the complex, nonlinear nature of behavioral and anthropometric features. This study aims to address these challenges by developing a robust classification model that integrates ensemble learning with synthetic data augmentation techniques. The research utilizes the Obesity Dataset from Kaggle, which comprises 2,111 records labeled into seven obesity levels, reflecting a realistic class distribution imbalance. Preprocessing steps included data cleaning, encoding, and stratified splitting. To enhance class representation, two augmentation methods were applied: SMOTE for synthetic oversampling and Generative Adversarial Networks (GANs) for generating realistic minority samples. A stacking ensemble model was constructed using Random Forest and XGBoost as base learners, with Logistic Regression serving as the meta-learner. Hyperparameter optimization was conducted using both grid and randomized search methods. Evaluation metrics, including accuracy, precision, recall, and F1-score, were used to assess performance. The proposed model achieved a 91% accuracy and an F1-score of 0.89, significantly outperforming models from previous studies. These findings suggest that combining ensemble learning with hybrid augmentation strategies effectively addresses class imbalance and improves predictive reliability in obesity risk estimation. The developed model holds practical value as a decision-support tool for early screening and targeted intervention in obesity prevention programs.

References

Ab Rahman, N. F., Wang, S. L., & Khalid, N. (2025). Ensemble Learning in Educational Data Analysis for Improved Prediction of Student Performance: a Literature Review. International Journal of Modern Education, 7(24), 887–902. https://doi.org/10.35631/ijmoe.724064

Değirmenci, A. (2025). Machine Learning Models for Accurate Prediction of Obesity: A Data-Driven Approach. Turkish Journal of Science and Technology, 20(1), 77–90. https://doi.org/10.55525/tjst.1572382

Delpino, F. M., Costa, Â. K., César do Nascimento, M., Dias Moura, H. S., Geremias dos Santos, H., Wichmann, R. M., Porto Chiavegatto Filho, A. D., Arcêncio, R. A., & Nunes, B. P. (2024). Does machine learning have a high performance to predict obesity among adults and older adults? A systematic review and meta-analysis. Nutrition, Metabolism and Cardiovascular Diseases, 34(9), 2034–2045. https://doi.org/10.1016/j.numecd.2024.05.020

Ganie, S. M., Reddy, B. B., K, H., & Rege, M. (2025). An investigation of ensemble learning techniques for obesity risk prediction using lifestyle data. Decision Analytics Journal, 14, 100539. https://doi.org/https://doi.org/10.1016/j.dajour.2024.100539

Hassan Mukhtar, M. A., Babiker Ahmed, A. U., Siddig Mohammed, M. A., Ibrahim Omer, N. O., Altom, D. S., & Elnour, M. A. A. (2025). The Role of Artificial Intelligence in the Prediction of Bariatric Surgery Complications: A Systematic Review. Cureus, 17(4), e82461. https://doi.org/10.7759/cureus.82461

Hayaeian Shirvan, M., Moattar, M. H., & Hosseinzadeh, M. (2025). Deep generative approaches for oversampling in imbalanced data classification problems: A comprehensive review and comparative analysis. Applied Soft Computing, 170, 112677. https://doi.org/https://doi.org/10.1016/j.asoc.2024.112677

Iqbal, M., Dharmawan, W. S., & Septian, R. (2024). Prediction of Obesity Categories Based on Physical Activity Using Machine Learning Algorithms. Journal of Computer Networks, Architecture and High Performance Computing, 6(3), 1025–1034. https://doi.org/10.47709/cnahpc.v6i3.4053

Ivan, J., & Prasetyo, S. Y. (2023). Heart Disease Prediction Using Ensemble Model and Hyperparameter Optimization. International Journal on Recent and Innovation Trends in Computing and Communication, 11(June), 290–295. https://doi.org/10.17762/ijritcc.v11i8s.7208

Khalid, A. R., Owoh, N., Uthmani, O., Ashawa, M., Osamor, J., & Adejoh, J. (2024). Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach. In Big Data and Cognitive Computing (Vol. 8, Issue 1). https://doi.org/10.3390/bdcc8010006

Khan, A. A., Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications, 244. https://doi.org/10.1016/j.eswa.2023.122778

Khater, T., Tawfik, H., & Singh, B. (2024). Explainable artificial intelligence for investigating the effect of lifestyle factors on obesity. Intelligent Systems with Applications, 23(July), 200427. https://doi.org/10.1016/j.iswa.2024.200427

Kundu, S., Biswas, S. K., Tripathi, D., Karmakar, R., Majumdar, S., & Mandal, S. (2023). A review on rainfall forecasting using ensemble learning techniques. E-Prime - Advances in Electrical Engineering, Electronics and Energy, 6, 100296. https://doi.org/https://doi.org/10.1016/j.prime.2023.100296

Lin, W., Shi, S., Huang, H., Wen, J., & Chen, G. (2023). Predicting risk of obesity in overweight adults using interpretable machine learning algorithms. Frontiers in Endocrinology, 14(November), 1–10. https://doi.org/10.3389/fendo.2023.1292167

Mahajan, P., Uddin, S., Hajati, F., & Moni, M. A. (2023). Ensemble Learning for Disease Prediction: A Review. In Healthcare (Vol. 11, Issue 12). https://doi.org/10.3390/healthcare11121808

Mamun, M., Farjana, A., Mamun, M. Al, & Ahammed, M. S. (2022). Lung cancer prediction model using ensemble learning techniques and a systematic review analysis. 2022 IEEE World AI IoT Congress (AIIoT), 187–193. https://doi.org/10.1109/AIIoT54504.2022.9817326

Maulana, A., Afidh, R. P. F., Maulydia, N. B., Idroes, G. M., & Rahimah, S. (2024). Predicting Obesity Levels with High Accuracy: Insights from a CatBoost Machine Learning Model. Infolitika Journal of Data Science, 2(1), 17–27. https://doi.org/10.60084/ijds.v2i1.195

Niakan Kalhori, S. R., Najafi, F., Hasannejadasl, H., & Heydari, S. (2025). Artificial intelligence-enabled obesity prediction: A systematic review of cohort data analysis. International Journal of Medical Informatics, 196, 105804. https://doi.org/https://doi.org/10.1016/j.ijmedinf.2025.105804

Rahaman, M. A., Idris, R. M., Zuveriya, S., & Sultana, N. (2024). Enhancing Diabetes Mellitus Onset Prediction through Advanced Ensemble Learning Techniques. 6(2), 11–28. https://doi.org/10.22452/josma.vol6no2.2

Rahman, A., & Tasnim, S. (2014). Ensemble Classifiers and Their Applications: A Review. International Journal of Computer Trends and Technology, 10(1), 31–35. https://doi.org/10.14445/22312803/ijctt-v10p107

Santos, K. C., Miani, R. S., & de Oliveira Silva, F. (2024). Evaluating the Impact of Data Preprocessing Techniques on the Performance of Intrusion Detection Systems. Journal of Network and Systems Management, 32(2), 36. https://doi.org/10.1007/s10922-024-09813-z

Singh, A., & Bobde, S. (2025). A Review on Predicting Failure in Industrial Machines: Its Methods, Challenges and Future Direction. 2025 1st International Conference on AIML-Applications for Engineering & Technology (ICAET), 1–6. https://doi.org/10.1109/ICAET63349.2025.10932305

Tech, A. K. (2020). Ensemble Learning for Classification - A Survey. https://consensus.app/papers/ensemble-learning-for-classification-a-survey-tech/cc6d2ff563c251a2b02cc781c3391814/

Wadghiri, M. Z., Idri, A., El Idrissi, T., & Hakkoum, H. (2022). Ensemble blood glucose prediction in diabetes mellitus: A review. Computers in Biology and Medicine, 147, 105674. https://doi.org/https://doi.org/10.1016/j.compbiomed.2022.105674

Downloads

Published

2025-06-30

How to Cite

Obesity risk estimation using ensemble learning and synthetic data augmentation techniques. (2025). Vertex, 14(2), 96-105. https://doi.org/10.35335/1bg4ws75

Most read articles by the same author(s)

Similar Articles

1-10 of 21

You may also start an advanced similarity search for this article.