“Artificial Intelligence”, “Big Data”, “Data Science” - buzz words such as these have been mentioned numerous times since the early 2000s. Many companies have since transitioned to a more data-driven culture; however, insurers have been more passive about adopting such a culture. This blog discusses the obstacles that may limit insurers in trying to become data driven; thus, it offers a perspective to aid the development of an overall data strategy.
A survey conducted by Kaggle in 2017, based on responses from more than 16,000 data scientists, provides us with a good indication of the barriers to overcome.1 Among the most commonly voiced problems facing workers in the data science realm are:
- Dirty data
- Lack of data science talent
- Lack of management/financial support
- Lack of a clear question to answer (i.e. direction)
- Privacy issues
Dirty Data
Dirty data contains errors or is unavailable and difficult to access. As with “Garbage in, garbage out”, if the data quality is poor, it will be difficult for anyone to make a decision based on it. Aside from the human error in entering data, processes within an insurance company also cause data to be dirty. For instance, in many organisations the data systems for different tasks or business units are isolated from each other, which creates an added layer of complexity because the data is often inconsistent and not accessible across business units. Having an integrated data system is important as it reduces the potential of having a dirty dataset and allows for accessibility of data across business units. Lack of well-defined data governance is usually the cause of dirty data. Only when the data is clean, can the data analyst and data scientist then quantify the amount of time to invest into modelling.
Lack of Talent
Understanding the different roles in the value-chain of analytics and getting the right person with the right skillset can be difficult. For instance, during the data acquisition phase, it is important that data engineers, data architects and different business stakeholders come together to formulate a sustainable framework for data governance. The big demand for different roles in data science makes it quite likely that too few resources are available to address relevant projects, deal with unrealistic expectations stemming from the overall hype, and bridge the gap between business questions and analytics tasks.
Lack of Support
Management and financial support are important in driving the culture of being data driven. Often, an organization responds to the hype of “Data Science” by jumping into it - without changing the existing culture of the firm. A data-driven company does not stop at forming a data science team. Instead, the entire organization should be aligned with the vision of being data centric. That means that all useful data is collected, accessible to everyone in a usable form and analysed, the results of which feed into relevant business decisions. This also requires financial support for acquiring the right tools for the right task. Otherwise, lots of time will be wasted on extract, transform and load processing (ETL). Many organizations also have the tendency to use Excel and Microsoft Access as their sole data storage and processing tools. This results in painful ETL processing during a deeper analysis stage, causing most business units to move away from being data centric.
Lack of Direction
A data science project requires a clear question to drive it. The aims of a data science project should be to generate business value. However, business value is often not clearly defined by the business owner. Having a clear idea about how “data science” should be used to generate business value helps in providing a direction in the analysis. Furthermore, a clear idea also allows the correct type of analytics to be used. Therefore, it is important to have a meaningful design of the data project so that the most value can be derived from it.
Data Security
The issue of data privacy is garnering increasing attention because data breaches can happen in the insurance industry. For instance, when a third-party administrator (TPA) is engaged for the administration of medical claims, sensitive medical data belonging to policyholders is shared and vulnerable to hacking. To ensure the security of the data and protect policyholders’ privacy, the insurer needs to know how the data is shared, what a TPA is doing with the data and how can the data be traced. Furthermore, the insurer needs also to be aware of possible infringement of the law when using external data while conducting a data science project. Given the security obligation to protect policyholders’ privacy and the financial implications of GDPR (the General Data Protection Regulation in the EU), one cannot be overly cautious on the topic of data privacy and security.
Overall, the importance of a well-thought-out and forward-looking data strategy cannot be over-emphasized. An understanding of the issues described above is an important first step towards becoming more data driven - and important to achieve before that strategy is developed.
Endnote
- Kaggle Inc, The State of Data Science & Machine Learning (2017), available at https://www.kaggle.com/code/arthurtok/2017-state-of-data-science-kaggle-survey/notebook, accessed May 2019.