The Accuracy of Prediction Models in Life Underwriting
December 08, 2015| By Guizhou Hu |
Region: North America
According to IBM, 90% of all the data in the world has been created in the past two years. Each of us is utilizing and providing data every time we log in to our computers, browse the Internet or simply drive down the highway with our iPhone in our pocket. The challenge for every industry is how to harness and effectively utilize that data to drive more profitable business.
The insurance industry is no exception to that rule. For years we have worked to balance the competing demands of gathering better data while controlling costs and managing time service expectations. Fortunately, more and more of the traditional data we’ve relied upon - such as medical records, motor vehicle reports, lab studies, and even pharmaceutical databases - has become digitized. This has led to improved efficiencies in many cases, yet we’ve only begun to reimagine how this data can better help us to more accurately predict mortality.
One of the many challenges life underwriters and actuaries face in adopting new prediction technology is how to evaluate the accuracy of the model. This blog outlines the basic statistical concept of accuracy of a predictive model in the context of life underwriting.
Discrimination: The first metric of accuracy we utilize is discrimination. This measures the ability of the model to predict individuals at higher and lower risk of death as well as ability of the model to predict those who will experience the outcome (in the case of life insurance, this is death) earlier than others. To explain it in another way, when a model has perfect discrimination, every individual who dies has higher predicted risk than those who live. In addition, individuals who die earlier have higher predicted risk than individuals who die later.
The statistical parameter for model discrimination is C-statistic. This is also referred to as the area under ROC (Receiver Operating Characteristic) curve. C-statistics have a value between 0.5 and 1. The lower end of the scale (0.5) is essentially the equivalent of a random coin toss to predict which individuals will live or die. A score of 1.0 represents perfect accuracy in differentiation. One can better visualize this by:
• First creating a chart where individuals are ranked by predicted risk and grouped by ten percentiles.
• Then visualizing the slope of observed mortality as the predicted mortality risk increases. The steeper the slope, the higher discrimination of the model is.
Calibration: The second accuracy model is calibration which is defined as the closeness between predicted risk value (predicted mortality for a group) and actual outcome (mortality). The common way to show calibration is to create a chart as outlined above and compare the predicted with observed mortality. A Chi-square value is a quantitative measure of calibration. The rule of thumb is that Chi-square below 20 represents a good fit. The lower the Chi-square, the higher the model calibration is.
Model discrimination and model calibration demonstrate different characteristics of the model. The following graphs demonstrate the divergence between good discrimination and good calibration. It is important to understand that in the context of life underwriting:
A. An underwriter is more interested in model discrimination, which is the power of the model to differentiate risk.
B. A pricing actuary is more interested in model calibration, which is the correlation of the predicted risk to actual mortality.
C. The ideal model exists when both discrimination and calibration are highly ranked.