Preventing motor claims fraud: Supervised vs unsupervised learning

According to the Association of British Insurers (ABI) in 2021, 49,000 motor insurance fraud cases were detected valued at £577 million. Motor fraud continues to be 60% of all claims fraud detected. (Aviva 2022).

In recent years, Artificial Intelligence (AI) has completely revolutionised how the industry operates. One such use case is identifying fraud at every touchpoint in the customer journey. Here, the focus is on First Notice Of Loss (FNOL) to enable same day payments – improving customer experience while minimising fraud.

In the customer journey, there are several points where data is continuously provided in myriad formats, including structured and unstructured data and images. At these points, AI models are being deployed to detect patterns and anomalies that may indicate fraud.

Two modelling techniques are having a transformative impact in motor claims: supervised and unsupervised machine learning.

Supervised Learning

Supervised machine learning involves training a model on ‘labelled’ claims – when an investigation outcome is known, for example Confirmed Fraud, Defeated, Partial Loss, Clear.

New claims are scored against the supervised model to highlight any high degrees of similarity with adverse claims. Synectics have a wealth of experience in this field with some of the top performing Precision models capturing over 90% of the fraudulent claims from less than 2% of the data at a False Positive Rate (FPR) of 2:1. Supervised models can be applied almost anywhere. The only requirements are a reasonable amount of data and the label/outcome you want to predict. This could include fraud prevention, marketing optimisation, pricing etc.

Supervised models can directly target specific challenges (e.g. staged accident fraud), provide a comprehensive fraud defence and consistently and quickly produce quality referrals. They are also predictable and explainable, aiding model governance approval as AI regulation increases.

For example, one of Synectics Solutions’ clients using the Precision model saved £6 million in the first month. Another reduced referrals by 50% and Increase conversion rates to fraud by 30%.

Unsupervised Learning

Unsupervised learning involves building models on data which hasn’t yet been labelled. It relies on statistical techniques like anomaly detection to identify claims not yet marked as adverse but are nonetheless unusual compared to a typical claim portfolio. It deploys an algorithm to discover hidden patterns and data groupings without human intervention.

This type of modelling is well-suited to highlighting new fraud modus operandi and a good fit for huge volume use cases like quotes and transactions. Unsupervised models are also useful where labelled data is not available, such as when launching into new markets.

For example, a Synectics client identified incremental cases of fraud and anti-money laundering concerns at a better than 2:1 FPR using an anomaly detection Precision model.

However, unsupervised models do have some disadvantages. They are prone to high false positives, can be overly sensitive and provide limited understanding of how they will perform in live environments. Consequently, obtaining unsupervised model governance approval for production use is often difficult.  

Data is King

When talking about supervised or unsupervised learning, we must mention data. Regardless of modelling technique, it’s imperative that the dataset content being analysed is relevant, timely and accurate to the problem being solved.

Within the claims models which showed optimal AUC (area under the curve), Synectics utilised core data comprising claimant information (Including information obtained at quote and policy stage), accident details (Structured and unstructured), vehicle data etc. However, the most predictive features - which added a 30% increase to the performance of the model - were syndicated fraud data from National SIRA.

Difference between the two models

The key difference between the supervised and unsupervised models is the approach to using labelled datasets.

  • Supervised modelling uses labelled datasets to check for adverse claims as opposed to unsupervised modelling algorithm which does not.
  • In supervised modelling, the algorithm learns from the labelled datasets and makes predictions based on the similarity to historic labelled behaviours. Whereas, unsupervised modelling makes predictions to highlight data that looks unusual or different in general. Therefore, it tends to be less accurate than supervised modelling.
  • Contrary to unsupervised modelling, supervised modelling requires human intervention in labelling the dataset in the first place.
  • Unsupervised modelling can work autonomously to detect anomalies and inconsistent or atypical patterns or behaviours.
Our experience

Synectics have years of experience in deploying AI & Machine learning within the insurance field, specifically looking at increasing conversion rates of successful claims. Through the use of Precision a customer achieved their best ever conversion rate of about 55%. The models within the organisations strategies vary across supervised and unsupervised learning depending on the business focus and data available. In the Synectics hub, data scientist set out to challenge the current modelling by developing a Proof of Concept (PoC) focused on supervised vs unsupervised modelling techniques that’s used as the basis of the assessment.

Synectics developed both a supervised Precision model and an anomaly detection Precision model (a branch of unsupervised modelling) as a PoC to quantify the difference in performance of the two AI approaches.

The purpose of this PoC was to determine which technique proved more performant as an FNOL anti-fraud defence. Both models utilised the same base data set as a fair comparison.

Analysing the PoC results, utilising both Precision models as part of a holistic anti-fraud claims strategy provided comfortably the best overall performance. From the cases classed as high risk by either model, 98.5% were only flagged by one of the two models, showing how well they complement each other.

Synectics work with a large number of insurers, and based on the data, if a choice between the supervised model and unsupervised model is to be made, it is evident that the supervised model emerges as the superior option. It demonstrated a substantial advantage by identifying over fives times the number of claims as fraudulent as compared to the unsupervised anomaly detection model.

To learn more about incorporating AI into your fraud detection strategies, click here.


Time to connect