Generalized linear models for insurance data pdf

Date published 


Generalized Linear Models for Insurance Data. Actuaries should have the tools they need. Generalized linear models are used in the insurance industry to. generalized linear models (GLMs), with an emphasis on application over theory. It is data preparation, selection of model form, model refinement, and model. Generalized Linear Models for Insurance Data. Generalized . PDF; Export citation. Contents 10 - Extensions to the generalized linear model. pp

Language:English, Spanish, Dutch
Published (Last):02.07.2016
Distribution:Free* [*Registration needed]
Uploaded by: BONNIE

68467 downloads 147184 Views 35.75MB PDF Size Report

Generalized Linear Models For Insurance Data Pdf

Actuaries should have the tools they need. Generalized linear models are used in the insurance industry to support critical decisions. Yet no text intro-. Generalized linear models for insurance data pdf. 1. Generalized Linear Models for Insurance Data Piet de Jong, Gillian Z. Heller; 2. Publisher. Request PDF on ResearchGate | Generalized Linear Models for Insurance Data | This is the only book actuaries need to understand generalized linear models.

Graphical representation of personal injury insurance data This data set is typical of those amenable to generalized linear modeling. The aim of statistical modeling is usually to address questions of the following nature: What is the relationship between settlement delay and the nalized claim amount? Does legal representation have any effect on the dollar value of the claim? What is the impact on the dollar value of claims of the level of injury? Given a claim has already dragged on for some time and given the level of injury and the fact that it is legally represented, what is the likely outcome of the claim? This book aims to point these out and outline useful tools that have been developed to aid in providing answers. Modeling is not an end in itself, rather the aim is to provide a framework for answering questions of interest. Different models can, and often are, applied to the same data depending on the question of interest. This stresses that modeling is a pragmatic activity and there is no such thing as the true model. Models connect variables, and the art of connecting variables requires an understanding of the nature of the variables. Variables come in different forms: discrete or continuous, nominal, ordinal, categorical, and so on.

Variables can be quantitative or qualitative. The data displayed in Figure 1. In this case the variable is skewed to the right. Continuous variables are also called interval variables to indicate they can take on values anywhere in an interval of the real line. Legal representation is a categorical variable with two levels no or yes. Variables taking on just two possible values are often coded 0 and 1 and are also called binary, indicator or Bernoulli variables.

Binary variables indi- cate the presence or absence of an attribute, or occurrence or non-occurrence of an event of interest such as a claim or fatality. Injury code is a categorical variable, also called qualitative. The variable has seven values corresponding to different levels of physical injury: 16 and 9.

Level 1 indicates the lowest level of injury, 2 the next level and so on up to level 5 which is a catastrophic level of injury, while level 6 indicates death. Level 9 corresponds to an unknown or unrecorded level of injury 4 Insurance data and hence probably indicates no physical injury.

Non-Life Insurance Pricing with Generalized Linear Models

The injury code variable is thus partially ordered, although there are no levels 7 and 8 and level 9 does not conform to the ordering. Categorical variables generally take on one of a discrete set of values which are nominal in nature and need not be ordered. Other types of categorical variables are the type of crash: non- injury, injury, fatality ; or claim type on household insurance: burglary, storm, other types.

When there is a natural ordering in the categories, such as none, mild, moderate, severe , then the variable is called ordinal. The distribution of settlement delay is in the nal panel.

This is another example of a continuous variable, which in practical terms is conned to an integer number of months or days. Data are often converted to counts or frequencies. Examples of count vari- ables are: number of claims on a class of policy in a year, number of trafc accidents at an intersection in a week, number of children in a family, num- ber of deaths in a population. Count variables are by their nature non-negative integers. They are sometimes expressed as relative frequencies or proportions.

The top left panel displays the histogram of log claim size. Compared to the histogram in Figure 1. Historically normal variables have been easier to model.

However generalized linear modeling has been at least partially developed to deal with data that are not normally distributed. Claim size versus settlement delay. The top right panel does not reveal a clear picture of the relationship between claim sizes and settlement delay.

It is expected that larger claims are associated with longer delays since larger claims are often more contentious and difcult to quantify. Whatever the relationship, it is masked by noise.

In essence, losing a car tyre to a theft, means the insurer will only reimburse the tyre and nothing more. Essentially, one cannot profit from an insurance contract. This presents considerable difficulty in pricing the product. The vice versa is also true. Coming closer home, Zimbabwe suffered major hyperinflation in — era. When the economy was rebooted by the assumption of the US dollar currency, insurance firms had to restart their savings as well.

A number of them tried to undercut competition with outrageously low premiums, until the Insurance and Pensions Commission IPEC had to step in to restore sanity and confidence to the industry, which is their government given prerogative IPEC Act A number of Zimbabweans end up opting for the RTA option since Comprehensive Motor insurance generally becomes expensive as the value of the car goes up.

In addition, this arrangement ignores the existence of other significant factors in pricing comprehensive insurance. In order to achieve the primary objective of the study, the following secondary objectives of the research were identified: 2 1. Results obtained will be regarded as valid irrespective of how large or thin the datasets were. Insurers need to profitably insure clients, and their clients only want value for money.

This study tried to converge the two ideals, in that GLMs yield actuarially fair premiums and hence result in value for money for clients. Similarly, if the insurers could reduce adverse selection and only accept premiums proportionate to the risk, the insurers would make a profit.

In addition, this study explores the merits of adopting the use of GLMs amongst insurers as a model that has diverse and effective applications in motor insurance.

Future innovations in insurance involve regression modelling and applications of this model will help do away with the rudimentary pricing system currently used in Zimbabwe. However, the analysis zoned into one Zimbabwean insurer. The choice for one insurer was actuated by the recognition that different insurers offer varied insurance policies.

The data used spanned from to Data used was obtained from one company. Conclusions of this study are representative and useful for the insurance company business, but they do not present a generalised character, therefore they cannot be applied to all portfolio or insurance companies.

The data used was not obtained through a random selection related to the entire population of policyholders. The second chapter includes the literature review, which is, scholarly work done by other people on the topic.

Discussion will be made around the GLM techniques, along with their merits. Other pricing techniques will also be discussed including the issues surrounding their application. Consideration will be made with regards to what has to be appreciated when deciding to develop models based on GLMs.

The third chapter details the methodologies employed in the study. This research made use of secondary data, and this section helps clarify what kind of data was used and what level of cleaning was applied to ready it for modeling.

This section will also outline the models to be used and any software packages on which these analyses were done. The fourth chapter deals with data analysis and presentation. In this section, all preliminary work before the model was developed is discussed. From the splits all the way down to the correlation tests. The fifth and final chapter concludes the study. Under this section, the researcher discusses the findings from the data, and also attempts to answer the questions presented in Section 1.

It goes to on offer recommendations and any ideas and suggestions for future study related to this topic. However, in Zimbabwe the majority of insurance companies use the percentage-based method.

Generalized Linear Models McCullagh and Nelder

This was proposed by the regulator, the Insurance and Pensions Commission of Zimbabwe IPEC , in order to restore sanity to the pricing regimes and also bolster public confidence in the insurance industry. Silva and Afonso cover a number of pricing methods in their comparative study conducted in Brazil.

They presented arguments against and in favour of the methods under varying circumstances. Starting with the Zimbabwean setting, it is worth noting that the current percentage method presents considerable advantages to a very young insurance market like that of Zimbabwe.

The dollarization era essentially cut off the amount of data that actuaries could consult in terms of pricing Comprehensive Motor insurance. Chidakwa and Munhupedzi did a study on the Impact of Dollarization on the Zimbabwean economy where they mentioned the presence of excessive inflation that characterized the era preceding Even at the end of these and other studies, researchers have failed to settle on a definitive USD to ZWD rate prevailing at the time of dollarization.

As such, the actuarial data available for exploitation is only from to date. However, since Zimbabweans generally shy away from Comprehensive Motor insurance, such privileges only exist for large firms such as NicozDiamond and Old Mutual. Smaller companies such as Quality Insurance, Hamilton, and so on, would remain with too thin datasets to create experience-based models.

The percentage-based method has the advantage of being easy to apply. Since the vehicle book value can be easily ascertained by use of evaluators and consulting the open market, one can apply this percentage without challenges.

Generalized linear models for insurance data pdf

It also helped level the playing field as fewer insurers got any comparative advantage over others since for each vehicle book value, almost each and every insurer had the same value to charge as premium. In the same vein, this has stifled consumer choice, since it essentially made all insurers the same, without any charging a fairer price.

This might be the cause for the low uptake of comprehensive motor insurance. However, we cannot be quick to blame the local regulator for such a move. Studies around Africa and the globe do show that the least economically developed countries generally apply the same method. Nigeria, Rwanda and Kenya charge 5 around 3. As long as compulsory Third-Party insurance exists in Zimbabwe, there is little incentive for insurers to pick on more complex rating based, GLM-based or usage-based insurance pricing models.

The average insurer in Zimbabwe still enjoys considerable profits from writing Third Party motor insurance. Usually, adverse selection exists on the comprehensive motor insurance front due to the uniform pricing system. Unless signed up for by a company, the individual would usually sign up for comprehensive insurance if their car is expensive, or they consider themselves riskier. The UK introduced their compulsory motor insurance in the s.

They later introduced rating systems to help determine the premium. Coutts explains that the UK and the Netherlands already had rating systems which were consulted and remodeled time and again, however, the emerging models, such as Generalised Linear Models were still in their infancy. They were purely statistical applications which did not provide much benefit as they did not cover practical aspects of premium rating.

Before Generalised Linear Models were more prevalent, rates were used to determine insurance premiums. The whole science, or rather, the art of rating remains an interesting subject until this day. The works of Bailey and Simon presented two risk classifications. The first was a Class Rating, which simply grouped risks by their characteristics, such as; gender, vehicle use and model, and so on. The other was a Merit Based system which rewarded clients based on past driving behaviour, felonies and misdemeanours committed and so on.

In modern day pricing, these two systems have merged, with the creation of Bonus-Malus systems which we shall consider in a few sections. Although based on UK data, he managed to explain that the points system, which is a variation of the standard relativity rating system had a lot of merits and also demerits. However, from the study, he notes that major UK insurers at the time preferred the method for its simplicity and its ease of manipulation.

Which is also 6 the case with the preference of the percentage-based method in Zimbabwe. Ultimately, he called for a detailed breakdown of insurance data, as this allows for an in-depth analysis of claim experience which in turn require statistical modelling, which assists in understanding the underlying structure.

This idea reinforces the need for insurers to collect as much information as they can to help price insurance risks. An Australian study presented to a Conference Meeting discussed major rating factors.

In addition to identifying factors around the driver, vehicle and location as major factors that influence the rating, Henwood and Wang outlined the existence of interactions between these major factors. They further corroborated Huang and Query , pp. Building on ideas brought forth by Coutts , we note that these analyses are made possible by vast collections of data.

In the UK, they consider the driver as the major and most significant component in pricing motor insurance. The rationale is; in and of itself a car has a degree of causing a loss, either by accident, fire or theft, but only up to a point. For instance, common cars such as the Honda Fit in Zimbabwe are prone to being stolen. In actual usage, the driver is the most significant determinant when considering the occurrence of an accident.

Given that some of the most important factors cannot be easily ascertained in the initial underwriting process, most insurers have introduced the Bonus-Malus coefficient. Lemaire concluded that rates no longer hold if they disregard important factors, whose considerable importance is acknowledged by common sense and experience: individual driving abilities such as accuracy of judgment, swiftness of reflexes, aggressiveness behind the wheel, knowledge of the highway code, and drinking behaviour, are not taken into account in motor insurance rating, a priori, as these variables are impossible to measure in a cost-efficient way.

It is now possible through the use of on-vehicle devices that employ the technologies of telematics. Back then it would have been difficult to so without excessive costs.

Although part of the globalized village, Zimbabwe tends to lend itself to the environment as mentioned 7 by Lemaire.

The internet is still a fairly pricey commodity, this means, the use of telematics is still considered a dream by many insurers as its application remains costly. An example is brought forward with two teenage females, driving the same vehicle model in the same city. The two may exhibit very different accident patterns, due to differences in individual behaviour hence the idea of trying to account for these differences a posteriori, by adjusting the premium from individual claims experience.

Slowly, the ideas we have examined up until this point tend toward a premium that is also proportionate to risk. In South Africa, a similar albeit slightly different version exists.

The short- term insurance industry has historically recognised the need to reward clients for every year they do not claim, while penalising clients that do submit claims. The idea is to encourage better risk management, as well as to discourage clients from submitting nuisance claims.

The clients will tend to cover claims themselves in order to enjoy the bonus or prevent an increase in future premiums. While this poses an advantage for the insurer, the insured does not enjoy value for money if they engage in such practices. Introduced into modern insurance by British actuaries from the City University, John Nelder and Robert Wedderburn in , and later illustrated by McCullagh and Nedler , the Generalised Linear Model GLM has been one of the most popular tools used to rate motor insurance for decades.

It is a highly valid methodology for vehicle insurance ratemaking and can easily handle a large number of risk combinations being examined and establish complex relationships related to claim experiences Huang and Query, Nelder and Verrall showed how credibility theory can be encompassed within the theory of GLMs.

In that vein Schmitter gave a simple method to estimate the number of claims needed for a GLM tariff calculation. The other one is a Multiplicative model. As outlined by Ohlsson and Johansson , Goldburd, et. With sufficient values, one can easily get negative premium values and claim estimates.

The multiplicative model is usually the most used in actuarial applications and it is also easier to develop rating model from it since rating models also employ multiplicative relativities. Also, of particular importance to premium rating are the Tweedie Models, named after a British statistician who presented a thorough study of the concept in The Tweedie models are best suited for premium estimation since most of their mass is close to zero and the remaining mass skewed to the right Goldburd et al, Which is essentially how most claim distributions would appear when depicted graphically.

A few problems develop when one wants to implement Generalised Linear Models in comprehensive insurance pricing, especially in the modern day global context. In a US based study, Williams and Shabanova concluded that young males were more likely than young females to be responsible for crash deaths, whereas females in their 50s and older were more likely than same-age males to be responsible. In terms of responsibility for deaths per licensed driver, young drivers, especially males, had the highest rates because of their high involvement rates and high responsibility rates.

If those two are factors that cause crashes, they surely have to be considered in pricing. Although not yet enforced in Zimbabwe, advocates of human rights and the speeding pace of globalization will soon require that we implement a similar ban as well.

By gathering more detailed information, insurers are guaranteed that such a move will cause little to no effect on their pricing methodologies. To fully explain their ideas, a variable termed Efficiency was created, by means of which the increase in price is calculated so that there is a reduction of one percentage point compared to the actual price in their dataset.

The comparative advantage of the historical pure premium can be explained by the low variation of the basic indicators over time and the fact that it contains the data for all insured vehicles in Brazil, enabling high adherence to the technique. However, GLM appears to be an interesting pricing alternative for medium portfolios that are seeing growth, or that explore possible niche markets. Working backwards leads to a fixed value of a premium as well.

As mentioned by Silva and Afonso , higher variability is also associated with this method. Making it less efficient than the two methods cited above. At the end of the day, one is better served with applying GLMs as they are more efficient. Another model that Huang and Query recommended is combining a Max Model alongside a GLM to improve accuracy in cases where the data has correlated risk factors.

Usually, the GLM handles fairly correlated factors quite well, it only struggles to fit values where the factors are highly correlated. Huang and Query noted that the practice of selecting the most significant and least correlated factors did not work well in China since it 10 leaves very few factors to work with. They went on to suggest the use of the correlated factors as they narrow the gap amongst relativities.

Policyholders that give incorrect information do not receive a substantial discount from the rate system, and there is enough redundancy to correct it. For example, actuaries in China used the book value of the car, engine capacity, manufacturer or type, mileage, etc.

This does not satisfy the assumptions of GLM. Other factors increase the complexity of distinguishing risks. Often major problems can be traced to one specific factor, while the other factors in the model are considered accurate and reliable. More models may be developed from GLMs, such as those mentioned by Goldburd, et. Coming to the modern era, one will notice the level of undoubtable influence that GLMs have had on the pricing of motor insurance.

Halliday mentioned GLMs as a basis for more advanced techniques. Already in neighbouring South Africa, companies like Discovery are already employing the use of telematics in vehicle-based insurance. Before we explore the technicalities of these concepts, it is imperative to explore the following definition.

In Computer Science, AI research is defined as the study of intelligent agents: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. This means the average consumer can finally have premium being determined by individual factors unique only to that individual.

Studies by Lui et al. Lui et al. They had built on earlier work by Han et al. However, this does not compare widely with global offerings in that it is only based on mileage. It has to be mentioned that even the most complicated methods such as the ones that utilise telematics and AI all have at their root, either GLMs or Classical Linear Models.

In conclusion, while the world is already far off in terms of how they apply technology within the insurance sector, Zimbabwean insurers have a lot to gain by shifting to statistical methods in pricing and reserving as compared to the deterministic methods they are currently using.

In addition, this means the local insurance industry will be a step closer to catching up with the world.

Before the technicalities of GLMs are considered, focus will be drawn to steps taken to prepare the data before the development of an appropriate model. Since each organisation has varying processes and systems for collating the data.

The actuary has to be well versed with the common themes and situations. In each modeling process, the data preparation step is repeated as correcting one error helps discover another. The immediate problem with assembling such a dataset is that exposure and claims data tend not to be stored in the same place. In many organizations, a policy-level exposure database is housed within the underwriting area and a claims database is housed within the claims handling area.

So, the first task of a modeling project is often to locate these two datasets and merge them. If best practices are followed, merging these two datasets would not be time-consuming.

Database specific methods are available for organisations that use other storage methods. Using the unique IDs, which were policy numbers in this case, the two sets were then merged. While not easy to present a one-size-fits-all formula for error detection, human judgment can be applied to look for outliers and so on.

This could be done in the following steps: 13 A. Check for Duplicates In exposure data, one checks for duplicate policy numbers. Such checks should be done prior to merging the data so that errors are not imported into the new dataset. Reasonability Checks In addition to duplicates, numeral values are checked for reasonability.

For instance, we do not expect negative ages in our datasets.