Investigating Data Balancing Effects for Enhanced Behavioural Risk Detection in Cervical Cancer Using BiGRU: A Pilot Study
DOI:
https://doi.org/10.37933/nipes/7.2.2025.24Abstract
Cervical cancer is a growth of cells that start at the cervix in the uterus that connects the vagina with its most common strain is the human papillomavirus. It is an easily treatable non-communicable disease when detected early. With traditional tests performed through pap screening, cervical cancer accounts for over 265,000 deaths in women, annually in Nigeria due to troubled breathing, bladder habit changes, weight loss, lumps, area thickening, etc. Immediate action is required with its early detection and warning to (pre)carrier patients. Machine learning models are often besieged by imbalance in class distributions, which ripples across the model – poor generalization, and high misclassification. Our study explores the UCI datasets to investigate the effects of data balancing on behavioral risk detection with cervical cancer using 6-known modes (RUS, UPS, SMOTE, ADASyn, SMOTE-Tomek, and SMOTEEN) to resolve dataset imbalance vis-à-vis to evaluate how well these balancing modes fit well with improved performance and generalization via the bi-directional gated recurrent unit (BiGRU). Results show that the SMOTE-Tomek links data balancing approach yielded the best values with a harmonic mean (F1-score) of 0.8189 and an Accuracy of 0.8182 with Recall, Precision, and Specificity values of 0.8028, 0.8048, and 0.8200 respectively to correctly classify 267-instances with 12-incorrectly classified instances. Results show model effectively identified cervical cancer risk behaviors with empirical results based on their use of balancing approaches, And the model performed best with the utilization of the SMOTE-Tomek approach.