Introduction to SMILE

SMILE forecasts life-cycle for all individuals in the Danish population, which allows for a much higher level of detail than is possible in the DREAM-group’s other models.

SMILE has been under development since 2010 where it was first used to forecast household demand in Denmark. The model has since been developed further and has gradually become very extensive. As a result, it can now provide insight into future developments in the Danish population as described by a wide array of demographic characteristics such as family structure, education choice, socioeconomic status, income, savings and wealth, moving patterns and housing choices.


SMILE is a dynamic microsimulation model that forecasts and analyses the life cycles of the Danish population on an individual-specific level.

The model uses register data which means that the initial population represents the actual Danish population on an individual level. Each individual has a wide array of variables that describe characteristics such as education, labour market status, family status, municipality of residence, housing characteristics, etc.

Every individual in the population are subjected to a variety of probabilistic events each year that could for example be death, moving, enrolling in a new education, or a change in their labour market status. If the event is considered to occur, the individual moves to a new condition. On the basis of these events, a life cycle is formed for all individuals.

The current version of SMILE forecasts demography, household structure, education level, socioeconomic characteristics, housing demand, income, taxation, public benefits, and labour market pensions.

SMILE is a microsimulation model. A defining feature of such models is that they are based on individual “entities” which can be either individual persons or families.

The model is based on an initial population where each individual is described by a number of characteristics including gender, age, education, family type, labour market status, income level etc. It is also registered which family an individual belongs to, and which type of dwelling the family occupies. The simulation forecasts the initial population from period to period where each period corresponds to one year. In the process the characteristics of each individual are updated each period. The updating is achieved by “exposing” individuals and households to a number of possible events. For an individual, possible events include to begin or finish an education, shift in income level, and of course to die. For a family, examples of events include marriage, divorce, and to move to another dwelling. In order to determine whether or not a specific event is realized, each person is “asked” a question to which the answer is either “yes” or “no”. The questions depend on the characteristics of the person. A typical question would be to ask a 30 year old male in a singleadult household whether he will find a partner during the following year.

Answers to these questions are randomly determined using transitional probabilities which depend on the characteristics of the individual. This is the probability that a specific event takes place during the following year. In the example given above, this is the probability that a single 30 year old male finds a partner during the following year. Transitional probabilities are calculated based on historical observations. If the event is found to take place, the effects of it will be implemented in the model. To continue the example, this requires that a single female also has answered “yes” to the question of whether she will find a partner, and in this case the two individuals will form a couple. In the following period, the male (and the female) will not be asked whether he (or she) will find a partner. However, if the event does not take place, the individuals will be asked the same question in the following period. In this way, it is possible to simulate the remaining life cycle for all individuals in the initial population and thereby form long-run projections.

Underlying Theory

Overall, the projection in SMILE is determined by estimated behavioural patterns that, via Monte Carlo simulations, determine the behaviour of model agents, i.e. individuals and families. The aim is that the selection of algorithms used to estimate the transition probabilities are data driven and are selected by using cross-validation. The development of the model is also in part a reflection of the experience of the researchers themselves. The experience-driven model development primarily reveals itself in the selection of estimation period, relevant data and the selection of relevant algorithms to calculate transition probabilities.

Other important methods in microsimulation are alignment, i.e. adapting the model to the development of, for example, the population and the workforce from other projections and matching individuals to families.

DREAM develop and maintain the dynamic micro-simulation model SMILE. The model starts with the entire Danish population in a base year (with approximately 5.7 million individuals) and simulates the further life course for each individual in this initial population. Transition probabilities depending on individual characteristics are estimated from observed transitions in a recent period.

The SMILE model (Simulation Model for Individual Lifecycle Evaluation) is a Danish dynamic, data-driven microsimulation model. The current version forecasts demography, household structure, education level, socioeconomic characteristics, housing demand, income, taxation, public benefits and labour market pensions.

The SMILE model is dynamic in the sense that an initial population (the entire Danish population of approximately 5.7 million persons) is forecasted into the future. The SMILE model is a data-driven model, based on rich Danish register data. The data cover the entire Danish population on annual basis in the period between 1986 and 2013. On each individual our dataset contains information about the person him-/herself (gender, age, educational background, labor market participation, income etc.), the person’s family situation (single/couple, number of children living at home etc.) and information about the dwelling that the person’s household occupies (location, owner/rental status, dwelling type and size etc.). We derive data from seven different sources made available through Statistics Denmark. The main data sources are the Danish Civil Registration System (CPR-registret), the Housing Register (Bygnings- og Boligregistret, BBR), the education register (Uddannelsesregistret) and the labor force statistics (Registerbaseret Arbejdsstyrkestatistik, RAS).

Demographic events such as death, birth, immigration, emigration etc. are modelled. Projections of death probabilities are based on the Lee-Carter econometric method (Lee & Carter, 1992). The model has been developed to include two regional models: in one the country is subdivided in 98 regions, while the other uses the more crude subdivision of 11 regions. The family structure is modelled by subdividing events of leaving home-events for adult children, couple-establishment and –splitting; to implement the couple establishment we deploy our matching algorithm called SBAM (Stephensen & Markeprand, 2013). To obtain a realistic family structure the model includes parity, in which the fertility coefficients includes a component that accounts for the previously births in the household.

The moving probability is determined by background characteristics of the household and by characteristics of the household's current dwelling.

The modelling of education decisions is based on a regionally subdivided transition probabilities calculated from Danish register data and it thus forecasts education levels by employing historical educational behaviour. The model establishes each person’s on-going education, duration of the current education spell and the highest attained education.

The modelling of income and labour market dynamics is subdivided into a labour supply model and an earned income model. The labour supply model firstly divide individual labour-supply event into gross labour force, retirement and study status. The gross labour force subdivides the status further into employment, unemployed, cash-benefit and no-income states, determined by a competitive risk model to model the annual in- and outflow that characterise the labour market dynamics. The model includes persistency of employment that should take into account the disparity in the distribution of attachment to the labour market in the population. The employment state is further subdivided into self-employment, assisting spouses and employed. Students also determine their employment status: a model based on annual transition probabilities. The retirement event consists of an early retirement scheme, a disability pension scheme and the age-retirement pension scheme.

Employed and students employed also determine their supply of weekly working hours, and for students a separate model determines the annual weeks of employment. Further a model for the wage rate is determined by a heterogenic dynamic model.

The income of self-employed and assisting spouses is determined by a dynamic model which has a heterogenic persistent term and duration of the employment spell.

Based on the earned income and the individual/family characteristics we can determine the rights of social- or income-replacing transfers and the tax obligations, and the contributions to the labor market pension schemes. The labor market pension schemes are personal and are basically life annuities.