Improving Imbalanced Data Classification in Auto Insurance by the Data Level Approaches.

Research Authors

Mohamed Hanafy

Research Date

Tue, 06/15/2021 - 12:00

Research Department

Department of Statistics, Mathematics and Insurance

Research File

Paper_56-Improving_Imbalanced_Data_Classification_1.pdf

Research Member

mohamed hanfi kotb ibrahim

Research Website

https://thesai.org/Publications/ViewPaper?Volume=12&Issue=6&Code=IJACSA&SerialNo=56

Research Abstract

Predicting the frequency of insurance claims has become a significant challenge due to the imbalanced datasets since the number of occurring claims is usually significantly lower than the number of non-occurring claims. As a result, classification models tend to have a limited ability to predict the occurrence of claims. So, in this paper, we'll use various data level approaches to try to solve the imbalanced data problem in the insurance industry. We developed 32 machine learning models for predicting insurance claims occurrence {(undersampling, over-sampling, the combination of over-and undersampling (hybrid), and SMOTE)× (three Decision tree models, three boosting models, and two bagging models) = 32}, and we compared the models' accuracies, sensitivities, and specificities to comprehend the prediction performance of the built models. The dataset contains 81628 claims, each of which is a car insurance claim. There were 5714 claims that occurred and 75914 claims that didn't occur. According to the findings, the AdaBoost classifier with oversampling and the hybrid method had the most accurate predictions, with a sensitivity of 92.94%, a specificity of 99.82%, and an accuracy of 99.4%. And with a sensitivity of 92.48%, a specificity of 99.63%, and an accuracy of 99.1%, respectively. This paper confirmed that when analyzing imbalanced data, the AdaBoost classifier, whether using oversampling or the hybrid process, could generate more accurate models than other boosting models, Decision tree models, and bagging models.

Faculty of Commerce

Improving Imbalanced Data Classification in Auto Insurance by the Data Level Approaches.

آخر الابحاث

Assiut
University

Important Links

Our Address

Typography

Body

General

Header

Main Menu

Footer

Copyright