Report

CRA-Zones: The Data Behind the Framework

November 2020

Download PDF

CRA-Zones™

The Data Behind the Framework

Abstract

The CRA-Zones framework defines the minimal elements needed to provide a view of accumulated cyber risk. For natural catastrophe risk, individual policy exposures can be aggregated within geographic zones.Similarly, cyber exposures can be aggregated using CRA-Zones. Location also holds importance when assessing cyber catastrophe risk, however, two additional elements must be taken into account to properly assess cyber risk accumulation: industry and company size. Insured companies with common characteristics related to location, industry, and entity size tend to be exposed to similar types of cyber events because these elements also correspond to technologies or service providers used. Based on an analysis of millions of cyber events in the last 20 years, Kovrr conducted extensive research, to serve as the core empirical validation for theCRA-Zones framework. Below is a subset of the research, in which a study group of 120 CRA-Zones was determined by selecting CRA-Zones with the highest relevance to the cyber insurance market(he research group was compiled according to criteria detailed in (Appendix A)

The total number of unique companies in the study group is 20,000, with an average number of 152 companies within a CRA-Zone, and a median of 86 companies. The research criteria focused on companies’ location industry, entity size, and the hosting and mail technology and service providers used by companies. The results showed a concentration of technologies and services when grouping by location, and further concentration when adding the additional elements of the CRA-Zones, entity size and industry to the analysis. The research shows that companies within the same CRA-Zone have the tendency to use the same service providers and technologies, and that different compositions of service providers and technologies can be found across CRA-Zones.

When trying to estimate accumulations of potential losses from cyber, insurance and reinsurance companies face two main challenges: identifying which policies are exposed to the same cyber events and determining how many policies will be affected at the same time. The former is related to the problem of enumerating all technologies and service providers each insured relies upon, the latter is equivalent to estimating the footprint of a cyber event. Analyzing accumulations by CRA-Zone enables risk professionals to make sense of the size and extent of potential losses from cyber, without necessarily needing to collect detailed information about technologies and service providers for each insured. The framework is completely agnostic to the line of business, therefore unlocking a full range of possible applications across both silent and affirmative cyber coverages.

Among these applications is the development of aggregate models. This research shows it is possible to estimate the two key ingredients needed for the development of industry loss curves, the hazard and the exposure, using the CRA-Zones as the atomic unit of aggregation. By identifying the correlation across CRA-Zones, an aggregate model can then be developed.

Introduction - What are CRA-Zones™?

The Cyber Risk Accumulation Zones (CRA-Zones) framework defines the minimal elements needed to provide a view of aggregated cyber exposure. Kovrr launched CRA-Zones during participation in the fourth cohort of the Lloyd’s Lab, the insurance technology accelerator operated by Lloyd’s of London. CRA-Zones is an open framework created to facilitate better communication across players in the cyber insurance value chain. The framework allows users to overlay their data pertaining to loss, cyber attack frequency, as well as additional data onto the CRA-Zones for additional insights of risk per zone and to detect correlations between different zones. The framework was created to support efforts for setting a standard for data collection for cyber exposure management.

The CRA-Zones are composed of the following three elements:

  • Location - country-level worldwide and state granularity in the US-based on the ISO3166 Alpha-2 standard.
  • Industry - an industry classification based on the SIC classification system.
  • Entity Size - four commonly used revenue classification bands in the insurance industry

The framework is built to accommodate various levels of data available. In cases of insufficient data, a data extrapolation technique can be applied for missing data points. CRA-Zones can be analyzed with low or high granularity and in various combinations. The views are built to accommodate the ability to use the framework despite varying quality of data within a group of risks.

Background


Kovrr’s impact based modeling framework addresses two main event types that can trigger a cyber event. The first event type is service provider events. These events are a failure of a third-party service provider, such as an email provider, cloud provider, etc. Third-party providers are a dominant part of modern IT architecture and are used by many companies operating today. Any damaging event (such as service outage, data leak, data loss) for a third-party provider can lead to significant damage that will entail claims from different coverage type (e.g. BI and extra expenses, restoration, regulatory fines).

The second event type is technology events. These are events that are caused by a flaw in a common third-party software library (shared pieces of code between technology providers) or a widely used product. An example of this event can be a vulnerability in a commonly used database server or a vulnerability in an encryption software library which is used in multiple products such as web servers, point-of-sale, etc.

In Kovrr’s whitepaper “Cyber Catastrophes Explained,” a cyber catastrophe is defined as “an infrequent cyber event that causes severe loss, injury or property damage to two or more, but typically a large population of cyber exposures.” In order for a cyber catastrophe to occur, companies must have a disruption to an important common system or process that is related to either technologies or service providers.


The following three elements were found to have a correlation to technologies and services. Each of the elements are already independently used by the insurance industry for analysis of accumulation and for reporting purposes and therefore most (re)insurers should have access to the data surrounding at least one of the elements.

  • Location - due to language and localization, local targeted marketing, trends and culture.
  • Industry - companies tend to pick products that answer the specific needs of the business.
  • Entity Size - the size of a company usually determines the nature and scale of the product, for example, larger companies require more robust or complex products to handle their data and infrastructure as opposed to smaller companies, and have more resources to invest in solutions.

Pursuant to this observation, this paper highlights a subset of Kovrr‘s research, which serves as the core empirical validation for the CRA-Zones framework. The research, an analysis of the technographics (technologies and service providers used by a specific company) of thousands of companies across five countries, focuses on industries that have suffered cyber attacks in the past and have high cyber insurance purchase rates.The primary resource utilized in this research is Kovrr’s Industry Exposure Database (IED) which holds detailed firmographic data for millions of unique businesses worldwide. The data contains all elements of a CRA-Zone: company locations, industry classifications, and estimated revenues.

Research Methodology


The initial step of the analysis methodology was to determine research parameters, a study group and a control group.


1. Research Sample Selection Criteria


In this research there was a need for defining selection criteria in order to narrow down the full data set of companies in Kovrr’s IED to a representative sample for the study and control groups. Kovrr’s IED dataset was narrowed down by choosing a research and control group based on two criteria: one is the location of the companies (five countries), and the second is the technographics, meaning, choosing two specific categories of technologies and service providers used by companies.


1.1. Location


In order to prove and demonstrate the application of the CRA-Zones concept globally, the research focuses on a sample of technologically and industrially developed countries. The countries contain millions of companies spread across different industries and sectors: US, Germany, UK, Japan and Spain. The analysis of a large, yet focused group, shows both the diversity within each country and the differences that arise within a global distribution.


1.2. Technologies and Service Providers


Another criteria is the specific services and technologies in use by companies in the chosen countries. When analyzing, technologies and services, hosting and mail6 were chosen due to the fact that they are among the most common components of infrastructural operations and directly influence the internet presence of companies worldwide.Outage or any other type of disruption to these services and technologies (from technical malfunctions to targeted cyber attacks and ransomware) could lead to significant business interruption, privacy/security leaks and/or breaches or other issues which would result in reputational damage, lost income, recovery expenses, legal fees, fines and more.

2. Research Study Group

The dataset was composed by extracting all the relevant data of companies in Germany, the US, the UK, Spain and Japan from Kovrr’s IED. The companies were then grouped by CRA-Zones, amounting to a total of 3,484 CRA-Zones. Out of this list of CRA-Zones, a study group of 120 CRA-Zones was determined by picking CRA-Zones with the highest relevance to the cyber insurance market. The total number of unique companies in the study group is 20,000, with the average number of 152 companies within a CRA-Zone, and median of 86.

3. Control Group


In order to account for the CRA-Zones hypothesis, a control group was established to allow for comparisons. The control group consists of the same companies that appear in the 120 CRA-Zones picked as the research group, however, instead of being grouped by CRA-Zones, they are grouped into 120 clusters following a uniform distribution. These 120 clusters of companies are parallel in size in terms of number of companies to the research group. In other words, a CRA-Zone with 300 companies has a corresponding cluster (in the control group) with 300 companies as well (not necessarily the same companies). This was necessary in order to show the influence of the elements on the CRA-Zones framework and its potential when assessing exposure and risk.


4. Research Study Sub-Group for Micro Analysis: Industries Relevant to Cyber Insurance


In an effort to provide comprehensible examples from the study group which are particularly relevant to cyber insurance, the original list of 120 CRA-Zones was narrowed down to a subset of 35. This sub-group is focused exclusively on industries with high cyber insurance purchase rates in each of the five countries.
The average number of companies per CRA-Zone in this subgroup is 231 with a median of 132. The minimum number of companies within a CRA-Zone is 50.

Analysis and Results

The data shows a concentration of same service providers and technologies being used by companies within a CRA-Zone. Additionally, the data shows different compositions of service providers and technologies across CRA-Zones.

Moreover, the analysis shows concentration of technologies and services when accumulated solely by location, and accumulation is further concentrated, when adding the additional elements of the CRA-Zones, entity size and industry to the analysis.


During the analysis process, it was evident that companies can use more than one provider for the same service or more than one technology for the same purpose. Hence the distribution of the service providers and technologies within a CRA-Zone presented is the calculation of the number of appearances of the technology or provider in the group (and not by the number of companies using them or their percentage of use within a company).

1. Analysis Across 120 CRA-Zones and a Subset of 35 CRA-Zones

The heat-maps below present the distribution of mail and hosting service providers and technologies across the subset of 35 CRA-Zones of the study group. The X-axis lists the service providers and technologies in a category(mail/hosting) ordered by market share, and the Y-axis are CRA-Zones ordered by location.

*For heat maps presenting the distribution of mail and hosting service providers across all 120 CRA-Zones please scroll to the end of this post.

  • It is evident that within the same CRA-Zone (each row) there is a concentration of technographics (the colors of the squares describing the level of concentration).
  • When comparing CRA-Zones, the composition of the technographics (in the same category) is different between the CRA-Zones (the colors are distributed differently).

2. Comparison of CRA-Zones™ to Control Group

The table below presents a summary of the comparison between the number of distinct service providers and technologies for hosting and mail that are in use in a CRA-Zone and in use in its corresponding control group cluster (across all 120)

*Determined by counting the total number of distinct providers in a given CRA-Zone or control group cluster.

The average number of distinct hosting technographics serving the companies within a CRA-Zone is 26.28, while in the control groups the average number is 57.54, with medians of 22 and 57 and a standard deviation of 16.71 and 20.89, respectively. The previous observation shows higher concentration of technographics within CRA-Zones than in the control groups, which shows fewer technology and service providers serving the same number of companies.

CRA-Zones & Corresponding Clusters
Figure 3: Number of distinct hosting technologies and service providers within a CRA-Zone vs. corresponding control group cluster
Figure 4: Number of distinct mail technologies and service providers within a CRA-Zone vs. corresponding control group cluster

Visible in Figures 3 and 4, the total number of distinct services providers and technologies in the CRA-Zones is always lower than in the control group cluster. A statistical significance test shows that the results are extremely unlikely to be the result of random chance (p < 0.0005).

3. Detailed Examples of Top Technographics (Comparison between CRA-Zones™ and Control Groups)

The table below shows the top three hosting and email technologies and service providers in several of the CRA-Zones in the research group and in their corresponding clusters in the control group.

The top three providers in the control group clusters are composed of the leading technographics in the global market, for example, Google or Amazon. The top three providers in the CRA-Zones are likely to include a local or regional provider as well. This can be observed in CRA-Zones: JP_I_80_S (small companies in the healthcare industry located in Japan) where two out of the top three email providers for the entire CRA-Zone are localJapanese companies. Additionally, this occurrence is also reflected in CRA-Zone ES_I_82_XS (extra small companies in the education industry located in Spain) where 2 out of the 3 top hosting providers are European.The presence of local providers can be further seen in Figure 1, where there is a large concentration of Japanese providers - NTT, GMO and KDDI only in CRA-Zones in Japan. No CRA-Zone outside of Japan in the sample uses these providers.

4. The Location Element and its Contribution to Risk Accumulation in Cyber

Location is often dismissed in cyber risk assessment because the effects of technology events can be global. While location is often assumed as an unimportant factor for aggregating loss in cyber, compared to its importance in natural catastrophe modeling, this research shows that location plays a role in contributing to cyber risk accumulation.

The table below presents a summary of the comparison between the number of distinct service providers and technologies for hosting and mail that are in use by groups, grouped by only location, compared to their corresponding control group clusters. This analysis has been conducted on the same companies in the research study group (120 CRA-Zones).

*Determined by counting the total number of distinct providers in a given location or control group cluster.

The average number of distinct hosting technographics serving the companies within a group accumulated by location alone is 74.62, while in the control groups the average number is 116.5, with medians of 76 and 113.5 and standard deviation of 21.79 and 9.22, respectively. This observation shows higher concentration of technographics within the groups grouped by location compared to the control groups, meaning fewer technology and service providers serving the same number of companies.


Also, in order to fully illustrate the importance and contribution of the location element in the CRA-Zones framework, the distribution of the technologies and service providers has been analyzed while holding the size of the company and its industry constant. The first step was analyzing the distribution of technographics across all the chosen location zones. As presented below, when holding the industry and size as a constant, results showed different compositions of technographics used in different locations. This result was observed as statistically valid (the results of the statistical significance test are p < 0.005).

Figure 5: The distribution of hosting service providers and technologies in CRA-Zones that differ in their location ordered by providers market share

Moreover, when analyzing the distribution of technographics by grouping by location alone (meaning, a variety of industries and sizes appear within the clusters), there is less concentration.


This analysis shows that the element of location within a CRA-Zone has an impact on the accumulation of the hazard, and although location has a value for the accumulation of technographics, the industry and size elements add an additional level of insight.

Figure 6: The distribution of hosting service providers and technologies in companies grouped by location alone and ordered by the technographic’s market share.

Each of the three elements of a CRA-Zone are a contributing factor for more accurate analysis of accumulation (with higher concentration, or less distributed technographics), than by analysis of aggregation of each element separately.

Conclusion

The CRA-Zones framework has been developed to enable the insurance market to better segment their cyber risk accumulation.Results of the analysis showed a concentration of technologies and service providers when grouping by location, and further concentration when adding the additional elements of theCRA-Zones, entity size and industry to the analysis. These results are a clear indication that companies within the same CRA-Zone have the tendency to use the same service providers and technologies, and different CRA-Zones contain different compositions of service providers and technologies.

Accurate accumulation of cyber risk based on the hazard can be a cornerstone of addressing some of the main challenges the(re)insurance industry currently faces in the realm of cyber risk modeling. By taking into account location, industry and entity size, the CRA-Zones open framework allows users to estimate which type of cyber events are likely to affect their portfolio, without having to have access to extensive technographic data. Importantly, when an event occurs, the CRA-Zones framework enables reinsurers and insurers the capability to estimate how the event may spread across CRA-Zones and apply this knowledge to understand the impact on a portfolio.

The footprint of a cyber catastrophe is described mainly by two parameters: the technology or service provider involved and the propagation pattern of the infection. The resulting trail of damage can also be described more simply by listing the CRA-Zones affected. This observation naturally leads to two possible developments for the framework: event response capabilities and correlation across CRA-Zones. The latter is currently a key consideration taken into account in Kovrr’s catastrophe model framework. The ability to overlay details of an unfolding event would enable insurance companies to react accordingly, respond more efficiently and ultimately better serve their clients.

Ultimately the concept of CRA-Zones is a natural candidate to become the atomic unit of aggregation for an industry loss model for cyber. This research shows it is possible to estimate the two key components for the development of industry loss curves by CRA-Zone: the hazard and the exposure. By estimating the correlation across CRA-Zones, an aggregate model can then be developed.

Extended Heatmaps of Technologies and Service Providers Across 120 CRA-Zones™

David Clouston, Visesh Gosrani, John Butler & Naomi Weisz also contributed to this report.

Marco Lo Giudice, PhD

Head of Pricing Models Development at Kovrr

Or Amir

Product Manager

Geniya Brass Gershovich

Cyber Intelligence Analyst

Amos Israel

Risk Data Scientist