Predicting Road Accidents and Analyzing their Patterns Using Supervised Machine Learning

Show simple item record

dc.contributor.author Oyoo, James Oduor
dc.date.accessioned 2025-12-03T13:35:36Z
dc.date.available 2025-12-03T13:35:36Z
dc.date.issued 2025-12-03
dc.identifier.citation OyooJO2025 en_US
dc.identifier.uri http://localhost/xmlui/handle/123456789/6876
dc.description Master of Science in Computer Systems en_US
dc.description.abstract Road traffic collisions are some of the most serious issues that the world is facing. This results in many fatalities, injuries, and financial losses, with low-middle-income countries (LMICs) bearing a disproportionate amount of the cases. Previous studies have examined this scenario by utilizing various methods and strategies on various sections and crossings. Conventional methods such as logit and probit models have been extensively employed to predict road accidents. Nevertheless, these techniques have flaws, such as the requirement of a predetermined mathematical form and the presence of missing values and outliers in the dataset, which negatively impact the outcomes of the prediction model. Conversely to statistical techniques, machine learning (ML) techniques can manage the outliers and missing values in the dataset. Designing accurate predictive models for road accidents is an important task for the transportation network, and this has enabled researchers to become innovative by developing prediction models (PM) and researching factors that contribute to these accidents. This thesis, therefore, aims to develop and evaluate a prediction model using an ensemble ML technique that incorporates supervised ML algorithms such as AdaBoost, K-Nearest Neighbors (K-NN), Decision Trees (DT), and Naive Bayes (NB) to predict road accidents and their patterns. The driving simulator was used as a research instrument to collect data. The data collected was then normalized and cleaned for analysis using the scikit-learn Python library. The synthetic minority oversampling technique (SMOTE) was employed to address the data imbalance prior to training the model. The particle swarm optimization (PSO) algorithm was used to identify the most important features in our dataset. The primary performance indicators, such as testing accuracy, precision, recall, and F1 score, were used to assess the models and compare their outcomes. The findings of this study indicate that the two-layer ensemble model outperforms the four base classification models based on four performance indicators, with 88% testing accuracy, 86% precision, 83% recall, and 84% F1 score. The proposed two-layer ensemble model can be utilized in the future for both theoretical and practical applications, such as road safety management to improve the existing conditions of the road network and inform the formulation of traffic safety policies based on evidence. Ultimately, the results showed that ML-based models outperformed statistical models. Keywords: machine learning, data imbalance, road safety, driving simulation, SMOTE en_US
dc.description.sponsorship Dr. Kennedy Ogada Odhiambo, PhD JKUAT, Kenya Dr. Jael Sanyanda Wekesa, PhD JKUAT, Kenya en_US
dc.language.iso en en_US
dc.publisher JKUAT-COPAS en_US
dc.subject Predicting Road Accidents en_US
dc.subject Supervised Machine Learning en_US
dc.subject Analyzing their Patterns en_US
dc.subject SMOTE en_US
dc.title Predicting Road Accidents and Analyzing their Patterns Using Supervised Machine Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • College of Heaith Sciences JKUAT (COHES) [850]
    Medical Laboratory; Agriculture & environmental Biotecthology; Biochemistry; Molecular Medicine, Applied Epidemiology; Medicinal PhytochemistryPublic Health;

Show simple item record

Search DSpace


Browse

My Account