Skip to main content

USING MACHINE LEARNING MODELS TO COMPARE VARIOUS RESAMPLING METHODS IN PREDICTING INSURANCE FRAUD

Research Authors
Mohamed Hanafy &Ruixing Ming
Research Date
Research Journal
Journal of Theoretical and Applied Information Technology
Research Member
Research Vol
99
Research Website
http://www.jatit.org/volumes/Vol99No12/4Vol99No12.pdf
Research Year
2021
Research Abstract

One of the most common types of fraudulent is insurance fraud. And in particular fraud in automobile insurance, the cost of automobile insurance fraud is substantial for property insurance companies and has a long-term impact on insurance firms' pricing strategies. And In order to minimize insurance rates, car insurance fraud detection has become necessary. Although predictive models for the detection of insurance fraud are in active use in practice, there are relatively few documented studies on the use of machine learning approaches to detect insurance fraud, likely due to the lack of available data. In this paper, by using real-life data, we evaluate 13 machine learning approaches. And Because of the imbalanced datasets in this area, predicting insurance fraud has become a significant challenge. Due to our data consist mostly of a "non-fraud claims " class with a small percentage of "fraud claims. " Thus that the prediction of fraud appears weakly with classification models; therefore, the present study seeks to suggest an approach that enhances machine learning algorithms' results by using resampling techniques, such as Random Over Sampler, Random under Sampler, and hybrid methods, to address the issue of unbalanced data. And we compare between them. This paper shows that after using resampling techniques, the efficiency of all ML classifiers is enhanced. Furthermore, the findings confirm that there is no one resampling method that overall outperforms. Besides, among all the other models, the Stochastic Gradient Boosting classifier obtained the best result when using the hybrid resampling technique.