The value of apprentices in the care sector: the effect of apprenticeship costs on the mobility of graduates from apprenticeship training

Introduction Considering research on vocational education, the effect of costs and benefits from providing training on the decision whether to employ apprenticeship graduates after completing apprenticeship training is one of the most relevant questions. This question is highly relevant for the employment biography of graduates and furthermore, teaches lessons about the motivation to train from the perspective of firms. Regarding the motivation to train, empirical research basically distinguishes between the production motive and the investment motive (Schönfeld et al. 2010). Companies that follow the production approach use apprentices as cheap labor input in order to increase the productivity of the firm (Lindley 1975). If a firm follows this approach, the aim is to cover the training costs completely during the training period. Further employment after completing the apprenticeship is therefore usually not planned. By contrast, firms that follow the investment motive primarily provide training in order to ensure a pool of qualified workers Abstract


Introduction
Considering research on vocational education, the effect of costs and benefits from providing training on the decision whether to employ apprenticeship graduates after completing apprenticeship training is one of the most relevant questions. This question is highly relevant for the employment biography of graduates and furthermore, teaches lessons about the motivation to train from the perspective of firms. Regarding the motivation to train, empirical research basically distinguishes between the production motive and the investment motive (Schönfeld et al. 2010). Companies that follow the production approach use apprentices as cheap labor input in order to increase the productivity of the firm (Lindley 1975). If a firm follows this approach, the aim is to cover the training costs completely during the training period. Further employment after completing the apprenticeship is therefore usually not planned. By contrast, firms that follow the investment motive primarily provide training in order to ensure a pool of qualified workers that can be employed after graduation (Merrilees 1983). By following this aim, apprenticeship costs exceed benefits at the point in time of graduation. This paper examines the causal effect of apprenticeship costs on the retention of graduates from apprenticeship training by focusing on apprenticeship training for geriatric nurses in Germany as a case study. How do we contribute to the general research on vocational education and labor market research by this approach and why is the focus on geriatric nurses important?
The first literature strand that is relevant are studies that identify the motivation to train. Because firms who follow the production approach often aim to outperform apprenticeship costs by the benefits until the completion of apprenticeship training, the calculation of net costs helps in order to identify the motivation to train (Mohrenweiser and Zwick 2009;Schönfeld et al. 2010). Alternatively, several studies propose to use retention rates of firms to identify the motivation to train (e.g., Franz and Zimmermann 2002;Mohrenweiser and Backes-Gellner 2010;Wenzelmann 2012). Firms that seldom employ apprentices after graduation are supposed to follow the production approach (Schönfeld et al. 2010). Following this, Pfeifer (2016) expects that an exogenous reduction in apprenticeship costs immediately reduces retention rates only in firms that follow the production approach. An exogenous decrease in apprenticeship costs thus helps to identify the motivation to train.
The second relevant strand of literature are papers that estimate the relationship between the motivation to train or apprenticeship costs on retention of graduates (Dietrich 2008;Schönfeld et al. 2020). However, empirical evidence on the causal effect of apprenticeship costs on the firms' strategy of employing graduates after the completion of apprenticeship is barely available. Both the apprenticeship costs and the decision of employing an apprentice after the completion of apprenticeship training depend on firm characteristics and characteristics of the apprentice. The relationship between apprenticeship costs and retention therefore suffers from endogeneity, e.g. by reversed causality between apprenticeship costs and retention.
This endogeneity is also a major topic in studies that estimate the effects of individual and firms' characteristics on retention rates in firms and labor market biographies of graduates from apprenticeship training (e.g., Wagner and Wolf 2013;Mohrenweiser and Zwick 2015;Dummert 2020). The same holds for papers examining the determinants of turnover of nurses. 1 The last literature strand that is connected to our empirical analysis are studies that attempt to identify an exogenous variation in mobility patterns and estimate the effect of mobility after completing apprenticeship training on wages. Initial selection into firms and endogenous selection of occupations, which also determine wages, are important to account for in order to identify a causal effect (von Wachter and Bender 2006;Lene and Cart 2018). The paper by von Wachter and Bender (2006) uses within-firm variation of retention rates and training firm fixed effects as exogenous source of individual retention of graduates. Mueller and Schweri (2015) and Fitzenberger et al. (2015) take advantage of variation in regional labor market tightness and further local characteristics in order to estimate the effect of mobility patterns of apprenticeship graduates on wages. While most papers find a negative effect of pure occupation changers and pure employer changers, 2 Fitzenberger et al. (2015) underline one important fact, which is in consistence with the large literature strand that studies the turnover of nurses. The mobility by occupation and the mobility by firm or both are important to distinguish. Furthermore, Göggel and Zwick (2012) underline that the size and the direction of the mobility effect differ by industry occupation and apprentice salary and Euwals and Winkelmann (2004) and Lene and Cart (2018) emphasize that time since graduation is important to account for.
However, less is known about the effect of the institutional setting of apprenticeship training on retention of graduates. So far, only the paper by Brebion (2020) uses exogenous variation in apprenticeship costs and estimates its effect on retention of apprentices in France. He exploits variation in apprenticeship subsidies that varies by region, amount, and criteria. In consistence with his theoretical model and predictions by Pfeifer (2016), he finds that a reduction in apprenticeship costs increases the probability of leaving the firm after graduation.
This paper aims to contribute to the literature strands presented in several ways. Firstly, we consider the apprenticeship system in geriatric nursing in Germany and take advantage of a unique policy experiment. From 2003 onward, federal states in Germany had the right of introducing a levy that redistributes a substantial part of apprenticeship costs between care facilities that provide training for (potential) geriatric nurses and facilities that do not. We take advantage of the fact that the underlying apprenticeship levy was introduced across the federal states at different points in time. This enables to remove substantial part of endogeneity in the relationship between apprenticeship costs and retention of graduates. Secondly, by doing this, this study is one of the first that underlines the role of the institutional setting on retention rather than examining the effect of individual and firms' characteristics. This thus provides a further perspective when studying the determinants of graduates' mobility and when considering the turnover of nurses. Thirdly, we use the exogenous variation in apprenticeship costs as an instrument for endogenous mobility patterns and estimate the effect of mobility on wages. To the best of our knowledge, this is the first paper on the link between mobility and wages after apprenticeship training that uses an institutional setting of apprenticeship training in order to tackle the problem of endogenous mobility decisions.
Fourthly, the focus on the group of geriatric nurses is of particular interest. The labor market is characterized by high labor supply shortage (not only in Germany). However, while the motivation to train, the costs and benefits of apprenticeship training is deeply studied for dual apprenticeship training in Austria, Germany, and Switzerland (Muehlemann and Wolter 2014; Moretti et al. 2017;Schönfeld et al. 2020), similar data is virtually non-existent for care facilities. Because the good producing by geriatric nurses is the state of health of persons in need of care, identifying the motivation to train has particular important implications in the care sector.
Our event study design shows that the levy scheme increases the probability of switching the employer after completing apprenticeship training by 10 percentage points. Further results of the Cox hazard model demonstrate that this effect is not driven by simultaneous reforms regarding the German care sector and by general trends in the apprenticeship market of care workers. However, we find that the treatment effect varies by the size of care facilities and is larger in small facilities. Furthermore, we use this quasi-experiment to estimate the effect of mobility of graduates on their wages. We find that leaving the training facility decreases the probability to earn a wage in the top quartile of the wage distribution.
The structure of the paper is as follows. In the next section, we describe the underlying levy and the apprenticeship market of geriatric nurses. In section "Data and methods", we present the data used and the empirical approach. In section "Empirical results and discussion", we show and discuss the empirical results. Finally, we conclude.

Institutional setting
The labor market of geriatric nurses is characterized by a large labor supply shortage, which became more seriously during the last decade (Federal Employment Agency 2020). As well as the rising number of persons in need of care in Germany, the low salary compared to nurses working in hospitals, poor working conditions, and weak family-workschemes are important facts why the turnover of geriatric nurses is that large, not only in Germany (Simon et al. 2010). Seibert and Wydra-Somaggio (2017) estimate that due to the large labor supply shortage, graduates from apprenticeship training often change the employer after graduation and unemployment after completing apprenticeship is seldom. Based on data of the Sample of the Integrated Employment Biographies (IEB), the authors find that 35% of geriatric nurses who graduated in 2013 and 2014 leave the training facility after graduation and change the employer. However, at the same time, nurses who leave the care facility, in which the apprenticeship was done, very often stick to the profession of geriatric nursing. 27% of graduates in geriatric nursing leave the care facility after completing apprenticeship training but stick to their occupation at a new employer.
From the side of unions, it is often argued that care facilities follow the production approach with negative consequences for the quality of apprenticeship: "Apprentices are used as skilled assistants and are not prepared for the working world as specialists. If we want to have motivated colleagues in the care facilities in the future, more space must be given to apprenticeship training. " (Tweet by Bochumer Bund from 15th June in 2020) 3 However, evidence that supports or contradicts this claim and that identifies the motivation to train is currently lacking. To tackle this research gap, let us consider the institutional setting of apprenticeship training in geriatric nursing in more detail.
Following the high turnover and a lack of labor supply, German policymakers reorganized the apprenticeship training of geriatric nursing. 4 Since 2003, apprenticeship training in geriatric nursing is regulated at the national level and requires appropriate apprentice salary (German Geriatric Nursing Care Act, GGNCA). Apprenticeship training takes 3 years and consists of 2500 h of practical training in care facilities and 2100 h of theoretical classes at school. Graduates of intermediate secondary school (Realschule) or other school education of 10 years that extends lower secondary school (Hauptschule) are eligible to begin an apprenticeship in geriatric nursing. However, graduating from lower secondary school is also sufficient if the person completed a two-year apprenticeship before, or owes permission to work as auxiliary nurse, which can be obtained by the completion of a one-year apprenticeship in auxiliary nursing (for more details, see Zöller 2017).
The German Geriatric Nursing Care Act gives the federal states the opportunity to introduce a levy scheme ( § 25 (1a) GGNCA), if apprenticeship costs cannot be longer financed by nursing charges 5 and if the levy system is requisite to impede or remove labor supply shortage (German Federal Parliament 2010). The levy scheme obliges inpatient and semi-residential care facilities and ambulatory nursing services to pay contributions to a levy pot, regardless of whether they engage in apprenticeship training or not. These contributions finance apprentice salary and the costs of continuing vocational education and training (CVET).
The levy scheme is organized as follows: Firstly, the levy pot is assessed by the responsible administration at the level of federal states. At all, this amount depends on the number of apprentices in geriatric care, on their annual apprentice salary according to the collective agreement, and on the amount of CVET costs. Secondly, after the levy sum has been determined, the sum is distributed on the two care sectors (inpatient/ semi-residential care facilities and ambulatory nursing services) according to the ratio of nurses per sector to the overall number of employed nurses in both sectors. As a last step, each care facility pays a contribution according to the annual sum of caring services, which depends on the degree of care dependency of the patients and the sector considered. Usually, the average apprentice salary per year that needs to be redistributed per apprentice exceeds €17,000, which is a substantial degree of redistribution.
Five federal states introduced the levy scheme. 6 These treatment states introduced the levy scheme at different points in time, which serves as a quasi-experimental setting. 4 The following expositions on the institutional setting of apprenticeship training in geriatric nursing is strongly based on Schuss (2021) who describe the institutional setting in greater detail. 5 Care facilities charge nursing charges for caring services and social care. Charges depend on the individual need for care and are mutually determined by providers of care insurance, providers of care facilities, and providers of social security (see Kochskämper 2019). 6 Bremen and Saxony are excluded from analysis. Bremen firstly introduced compensation payments only for the apprenticeship in auxiliary nursing and Saxony introduced the levy scheme, which, however, was later declared as void by the Federal Administrative Court (German Federal Parliament 2010).

While the levy scheme became effective in 2005 in Rhineland-Palatinate and in 2006
in Baden-Wuerttemberg, North Rhine-Westphalia and Saarland introduced it in 2012. Hamburg followed in 2014. Nine control federal states remain that did not introduce the levy scheme.
The five treatment states imposed two differences in the design of the levy scheme, which are worth to mention. Firstly, while Hamburg, North Rhine-Westphalia, and Saarland refund the full amount of apprentice salary including social security contributions (100%), Baden-Wuerttemberg and Rheinland-Palatinate refund only a faction and differ in the relative refunding between inpatient and ambulatory facilities. 7 Secondly, Hamburg, Rhineland-Palatinate, and Saarland introduced compensation payments for both training geriatric nurses and training auxiliary nurses. Research (IAB) and via remote data access at the FDZ. The data set combine different administrative sources to indicate the exact employment status on a daily basis (Antoni et al. 2019). These sources capture employees as well as recipients of different social and unemployment benefits, job seekers, and people currently participating in measures of the Federal Agency of Employment. The integration of these administrative sources allows collecting appropriate information on occupation, sector, (un-)employment and apprenticeship training in spell format. By having exact information about when a spell starts and ends at a daily basis, the graduation of apprenticeship training and the tenure at a new employer can be observed rather adequate.

Data
To generate the sample on apprentices that successfully completed training in geriatric nursing, we follow Fitzenberger et al. (2015) and Dummert (2020) who used the same data basis. Geriatric nurses are captured by the three-digit Classification of Occupations (KldB) 1988 and 2010 (Paulus and Matthes 2013). To separate geriatric nurses from social workers who belong to the same three-digit category due to the classification from 1988, we use the five-digit Classification of the Economic Sector (WZ). This sector classification clearly identifies inpatient and semi-residential care facilities and ambulatory nursing services. Because the three digit occupation classification does not allow to differ between apprentices in skilled geriatric nursing and apprentices in auxiliary nursing, we calculate the number of days an individual is observed as an employed apprentice in geriatric nursing. By excluding persons with training duration less than 1 year and a half, we exclude apprentices in auxiliary nursing, whose training usually lasts 1 year. By this choice, we also allow for the possibility to shorten the 3 year training of geriatric nursing. Furthermore, we allow for a maximum training duration of 5 years.
We use the imputed educational variable (Fitzenberger et al. 2005;Thomsen et al. 2018) to observe completion of apprenticeship training. Completion is observed when an individual is observed as holding no vocational degree during apprenticeship training and switches to holding vocational degree afterwards. This procedure does not make it possible to include graduates in geriatric nursing who received another vocational degree before. Moreover, in the data, time lags between the graduation from apprenticeship training and the point in time when this change is recorded are possible. This issue can have the consequence that a change in the educational variable sometimes can only be identified after a change of the employer (Fitzenberger et al. 2015). To tackle this issue, we also use the employment status to indicate when a person switches from being an apprentice to being regularly employed. Furthermore, it is not possible to identify apprentices in skilled geriatric nursing who do not graduate from this apprenticeship training. This is why, we exclusively consider apprentices who successfully graduate from apprenticeship training in skilled geriatric nursing throughout this paper. We exclude individuals holding an academic degree and who are older than 30 at the end of apprenticeship training. Because there are some institutional differences between apprenticeship training in geriatric nursing and dual apprenticeship training, we allow for some deviations regarding the age and the training duration limit compared to prior papers who identify apprenticeship graduates with the same data set. For instance, apprenticeship training in geriatric nursing can be started in any month throughout the year. Restricting the beginning of training to some particular month as in Dummert (2020) is therefore not feasible.
Regarding the mobility pattern of graduates, the longitudinal character of the data set allows to follow graduates after graduating and to distinguish between multiple patterns of mobility. Furthermore, it is possible to identify whether the individual switches the firm but sticks to the profession of geriatric nursing or whether the individual also switches the profession. Because of the high self-identification of geriatric nurses with their profession and the specificity of the apprenticeship training, we define a switch of profession, if graduates do not longer work as geriatric nurses and outside of (semi-)inpatient care facilities and ambulatory nursing services. This is in contrast to the standard definition of profession switches in the literature who define a switch at the two or three digit level of job classification (Fitzenberger et al. 2015;Seibert and Wydra-Somaggio 2017). However, regarding our research question, this choice seems to be more plausible because our definition is linked to the eligibility of the levy scheme.
Our final sample consists of 654 graduates of geriatric nursing. A substantial share of 35.6% of graduates leave the training facility immediately after completing apprenticeship training. In consistence to Seibert and Wydra-Somaggio (2017), the majority of graduates who leave the training facility after graduation stick to the profession of geriatric nursing. Only a small share of about 5% of the facility switchers leave the profession, however, most of this share continue working in another profession within the sector of health services. Facility switchers leave the training facility on average after 9 months since graduation. However, the median duration since graduation at the facility switch for facility switchers is 23 days, which reflects the right skewness of this variable. Table 1 presents characteristics of graduates and care facilities by mobility status. First of all, median duration of training is consistent with the standard duration of training per law of 3 years. Further characteristics reflect the endogeneity in the decision whether to leave the training facility or not. Switchers are younger, completed the apprenticeship training more often in smaller care facilities and more often in ambulatory nursing services. Additionally, switchers live in counties with fewer school leavers.

Methods and identification strategy
Usually, estimating the effect of apprenticeship costs on mobility of graduates results in a biased coefficient. Characteristics such as firm size and apprentice salary determine the sum of apprenticeship costs and the firm's retention strategy. Furthermore, the decision whether to leave the firm after graduating or not, depends on (unobserved) characteristics of the graduate. In order to estimate the causal effect of apprenticeship costs on mobility of graduates, it is thus important to identify some kind of exogenous variation in apprenticeship costs that is uncorrelated with characteristics of the firm and characteristics and future plans of the graduate.
In order to tackle this issue of endogeneity, we estimate the effect of the levy scheme on mobility patterns of graduates in geriatric nursing and exploit the fact that the reform was implemented across the federal states at different points in time. We model the decision whether to leave the training facility after graduating within the framework of the proportional Cox hazard model. By applying this method, the mobility pattern from the training facility to another employer or into unemployment can be nicely modeled in dependence of the number of days since completing apprenticeship training. 8 It should be noted that we also estimate Eq. (1) by applying logit regression in order to facilitate the interpretation of the treatment effect.
The probability of leaving the training facility h ijt depends on characteristics of the apprentice and of the care facility X ′ it , of the number of school leavers in county k and its 1 year lag S ′ kt,t−1 . 9 We also control for dummies for federal states θ j and for dummies for the year of the beginning of apprenticeship training yr_begin in order to control for differences between federal states and time trends in the retention strategy of care facilities. 10 Furthermore, as recommended by Bertrand et al. (2004), the following equation clusters standard errors at the level of the treatment, which is the level of federal states: Previous studies define the term mobility (or retention) of graduates in different ways, depending on the unit of observation in the data used. If aggregated data is available, retention rates can be calculated at the level of firms or professions (e.g., Mohrenweiser and Backes-Gellner 2010; Brebion 2020). On the contrary, if individual data is available, it is common to model mobility in the form of a categorical event. By following previous research (e.g., Simon et al. 2010;Fitzenberger et al. 2015;Mueller and Schweri 2015), we define the choice of leaving the training facility in such a categorical form according to Eq. 1. We observe whether a graduate left the training facility until a certain number of days since graduation from apprenticeship training t. Moreover, we are able of distinguishing between graduates who leave the training facility but continue to work in care facilities or ambulatory care services as geriatric nurse and graduates who leave the profession of geriatric nursing at all (see section "Data").
According to the theoretical framework by Brebion (2020), firms base their motivation to train on the conditions applying at the start of apprenticeship contracts. This is in consistence with the definition of the production approach, where benefits outperform the costs at completion during the entire apprenticeship training. This implies that the decision whether to employ apprentices upon completion depends on the conditions at the start of apprenticeship training. The treatment indicator Treated ijt thus equals one if the federal state introduced the levy scheme at or before apprenticeship training started. This approach uses both regional variation in the institutional setting by federal states and temporal variation according to the starting date of apprenticeship training.
In order to interpret β as causal, the introduction of the levy scheme should not be correlated with individual characteristics and attributes of the care facilities (balance assumption). Furthermore, the pre-reform trends in mobility patterns of graduates of apprenticeship training and other variables describing the labor and apprenticeship market of geriatric nurses (wages, type of contracts) should be similar between the treatment and the control group (common trend assumption). While the common trend assumption is examined in greater detail in Table 3, Table 6 checks the balance assumption and regresses a binary treatment variable on explaining variables for the pre-reform periods. It should Schuss Empirical Res Voc Ed Train (2021) 13:15 be noted that the dependent variable equals one if the graduate resides in one federal state that has not already but will later introduce the levy scheme. The treatment indicator thus equals zero if the graduate resides in one federal state that will never introduce the levy scheme. By doing this, we can analyze whether the composition of treatment and control states differs significantly to each other before the reforms. The results demonstrate that only the variables age and German citizenship significantly affect the probability of being in the treatment group. This indicates that graduates in federal states that introduced the levy are slightly older and are less often of German citizenship than in control states. This may reflect that graduates in East Germany, where none of the federal states introduced the levy scheme, are younger and federal states in East Germany are less open to foreign workers. This legitimates further sensitivity checks in the Section "Heterogeneous effects" that perform separate analysis by East and West Germany in order to examine whether the effect of the levy scheme is produced by East-West differences in observable characteristics. Table 6 controls for additional regional variables such as the number of persons in need of care across counties. Regarding the number of nurses relative to the number of care-dependent persons in inpatient facilities, there appear significant but small differences.
Thus, the insignificant coefficients of the number of care-dependent persons and the number of slots per 1000 inhabitants above the age of 64 hint at the fact that the labor demand for geriatric nurses does not significantly differ between federal states that introduced the levy scheme and federal states that did not. A further explanation why some federal states introduced the levy scheme and some did not is the political agenda of the governments of the states. Generally, social democratic and left-wing parties are less skeptical toward redistribution than liberal or conservative parties. However, while three treatment states are governed by the German social-democratic party (SPD), two treatment states are governed by a government headed by the German conservative party (CDU) at the time of the enactment of the reforms. Thus, the political agenda of the federal states also cannot fully explain why some federal states introduced the levy scheme and some did not. However, one difference is that all treatment states are located in West Germany. One reasons for this may be that East Germany is more affected by the sector-specific minimum wage introduced to the care sector in July 2010 and that policymakers in East Germany could interpret the levy scheme and the minimum wage as a double burden to care facilities. Whether an exclusive focus on West Germany changes our estimation results, is examined in detail in the Section "Heterogeneous effects".
After examining the relationship between the levy scheme and the mobility patterns of graduates, we estimate the effect of mobility on wages in a second step. To perform a twostep instrumental variables (IV) estimation, we first estimate the probability of leaving the training care facility ( mobility ijt ) in dependence of the introduction of the levy scheme and further control variables by estimating Eq. (1). In the second step, we estimate the effect of the estimated mobility probability on some indicator of earnings wage ijt .

Baseline results
We start by presenting empirical results of Eq. (1) by applying the proportional Cox hazard model. We gradually add characteristics of graduates, care facilities and information on school graduates in counties throughout the Models 1-3. The hazard ratio indicates a significant positive effect of the levy scheme on the probability of leaving the training facility. This estimated hazard ratio remains robust in magnitude throughout the Models 1-3. To get a better understanding of the size of the effect, we also apply logit estimations. Following this, the introduction of the levy scheme increases the probability of leaving the training firm by 9.8 percentage points. Compared to the share of graduates that leave their training facility some day after graduation (68.7), this is an effect of 14.3%.
In order to link those findings with the motivation to train of care facilities, we need a closer look at the mobility patterns right after graduation. Following the definition of the production approach, we expect that the levy scheme affects the decision of leaving the training facility right after graduation. Figure 1 therefore illustrates the treatment effect in dependence of the number of days since graduation (according to Model 3 of Table 2). The figure demonstrates that the treatment effect is significantly positive throughout the whole process. However, in particular, right after graduation (at t = 0 ), the levy scheme produces a difference in the probability of leaving the training firm of around 10 percentage points. We explicitly check this by reestimating Model 3 of Table 2 by Logit estimation and by modeling the probability Table 2 Empirical effects of the levy scheme on the probability of leaving the training facility Stars denote significance of hazard ratios (HR) or average marginal effects (AME): * p < 10% , * * p < 5% , * * * p < 1% ; standard errors clustered at the level of federal states are in parentheses This table estimates Eq. (1) and presents the effect of the levy scheme on the probability of leaving the training facility (parameter β ). Estimates are performed by the Cox hazard model and by logit regressions. Throughout Models 1-3, we smoothly add covariates summarized in Table 1. In Model 4, we exclusively consider facility stayers and facility leavers who stick to the profession of geriatric nursing. Thus, we exclude graduates whose first new workplace after graduation does not belong to the profession of geriatric nursing. In Model 5, we only include graduates who stay at the training facility and graduates who both leave the training facility and the profession. Thus, in this model, we model the probability of leaving the profession after graduation of apprenticeship training that graduates leave the training facility right after graduation (at t = 0 ). This gives a significant treatment effect of 0.101 that only differs slightly to the marginal effect in Model 3 of Table 2 of 0.098. 11 In Fig. 1, the gap between treatment and control states increases during the first year after graduation. Because the length of unemployment spells, the share of full-time contracts and the share of fixed-term contracts of graduates who leave the training facility during the first months after graduation do not differ significantly between the treatment and control group, this risen gap is puzzling. However, it is not unusual if care facilities do not adjust their retention strategy with regard to the levy scheme directly after graduation. It can be presumed that similar to firms of other sectors, not each care facilities can be clearly assigned to the production approach and that there are care facilities that do not follow a distinct type of motivation to train. Moreover, the gap between the treatment and control states remains constant if time since graduation exceeds 1 year for the rest of the process. Thus, we conclude that Fig. 1 therefore demonstrates that the significance of the treatment effect displayed in Table 2 is not driven by the long observation period. Instead, the significant treatment effect is mainly attributable to the time right after graduation.
The group of graduates who leaves the training facility consists of two different groups of facility switchers; those who stick to the profession of geriatric nursing and those who leave the profession of geriatric nursing because of working conditions, salary, familywork conflicts, and further reasons. The effect of the levy scheme is expected to differ between these two groups of switchers. We presume that the estimation will be clearer Fig. 1 The share of graduates that stick to the training facility in dependence on time since graduation and treatment status (in percent). The graph displays the share of graduates that has not left the training facility in dependence on days since graduation and treatment status, yet. Survival rates are calculated based on Model 3 of Table 2 with conditioning on variables summarized in Table 1. An individual is assigned to the treatment group if he or she began the apprenticeship training at the point in time when the federal state, where the individual resides, have already introduced the levy scheme. Source: Sample of Integrated Labour Market Biographies 1975-2017 (SIAB 7517v1); Regional Database of the Federal Statistical Office and the Federal Employment Agency, own illustration and more efficient for the first group. Regarding the second group, we expect that the levy scheme does not longer affect the decision to leave the training facility because this group mainly consists of graduates who leave the training facility by one's own choice, which is related to the attraction of the profession. Furthermore, there are no reasons how a reform regarding the financing of apprenticeship training could affect the attraction of working in care facilities. These hypotheses are tested in the Models 4-5 in Table 2 by performing pairwise Cox hazard regressions.
In Model 4 of Table 2, we exclusively consider facility stayers and facility leavers who stick to the profession of geriatric nursing. Thus, we exclude graduates whose first new workplace after graduation does not belong to the profession of geriatric nursing. As a consequence, the treatment effect increases regarding the Cox model. Although the logit regression results in a slightly smaller AME, this estimate shows less noise and gives a smaller standard error due to the exclusion of facility switchers that leave the profession.
In Model 5 of Table 2, we only include graduates who stay at the training facility and graduates who both leave the training facility and the profession. Thus, in this model, we model the probability of leaving the profession after graduation of apprenticeship training. Then, the effect of the levy scheme looses significance. Regarding the logit regression, the effect do not longer differ significantly from zero, and additionally, the size of the effect is near zero. It should be noted that Model 5 includes graduates who first leave the training facility but stick to the profession of geriatric nursing but finally also leave the profession at some later point in time. In Model 5, we only exclude graduates who left the training facility but stayed to the profession of geriatric nursing during the entire observation period.

Sensitivity analysis
The identification strategy exploits variation in apprenticeship costs by federal state and by the date of the beginning of apprenticeship training around the implementation of the levy scheme. The long observation period enables to control for within-person autocorrelation and unobserved heterogeneities. However, the long-term character of the empirical analysis also produces some space for endogenous anticipation from the perspective of care facilities; for instance, if we compare an individual who started apprenticeship training 5 years before the introduction of the levy scheme to an individual who started apprenticeship training 5 years after the reform.
We tackle this issue by restricting the sample more closely around the introduction of the levy scheme. We calculate the number of days until the levy scheme will be introduced ( D ij − D 0 < 0 ) and the number of days since the levy scheme has been introduced ( D ij − D 0 > 0 ) and then estimate Eq. (1) again. By restricting the estimation sample to two, three, and four years before and after the implementation of the levy scheme, we take the issue of possible anticipation into account. The Models 1-3 of Table 3 demonstrate that the positive effect remains significant and that the magnitude increases by applying the Cox hazard model and logit regression. Apparently, endogenous anticipation biases estimations downward. By applying logit regressions, the estimated effect increases to 17.5 percentage points if the range is restricted to 4 years around the implementation of the levy scheme.
Models 4 and 5 explicitly consider the common trend assumption. Firstly, we perform a placebo test which is also helpful in order to test whether the positive treatment effect can be clearly attributed to the levy scheme and is not the result of a mere time trend of decreasing retention. We assume that each federal state in the treatment group introduced the levy scheme 2 years earlier. Model 4 demonstrates that in this case, no significant link between the levy scheme and the mobility probability is detectable. In Model 5, we perform another standard check of the common trend assumption by adding interactions between a time trend and labor market specific pre-reform controls of graduates to the regression measured in the first year in which graduates were part of the sample. This results in quite similar results compared to the baseline estimation in Model 3 of Table 2.
Finally, we check whether parallel reforms affecting the care sector bias our estimation results. In July 2010, the sectoral minimum wage in the care sector was introduced and was set at €8.5 in federal states of West Germany and at €7.5 in federal states of East Germany (Boockmann et al. 2011;Harsch and Verbeek 2012). To control for the influence of this reform, Model 6 controls on a further variable Post2010, which equals one for years since 2010. The hazard ratio and the AME do not change and stay robust in significance. Furthermore, the introduction of minimum wage is not correlated with the introduction of the levy scheme. As it was shown in Table 6, wages in care facilities are not correlated with the treatment status.
This table estimates Eq. (1) and presents the effect of the levy scheme on the probability of leaving the training facility (parameter β ). Each estimation controls for covariates summarized in Table 1. Estimates are performed by the Cox hazard model and by logit regressions. Throughout Models 1-3, we restrict the sample to graduates that begun their apprenticeship training maximum two, three, or four years before or after the introduction of the levy scheme. In Model 4, it is assumed that each federal state in the treatment group introduced the levy scheme 2 years earlier than it was actual the case. In Model 5, we add interactions between a time trend and labor market specific pre-reform controls of graduates measured in the first year in which graduates were part of the sample. In Model 6, the introduction of the minimum wage is tackled and the estimation controls on a further variable Post2010, which equals one for years since 2010.

Heterogeneous effects
The smooth adding of covariates and the decrease of the treatment effect throughout the Models 1-3 in Table 2 indicate that the effect of the levy scheme is somehow correlated with characteristics of graduates and care facilities. To check whether the baseline results mask some heterogeneity, Table 4 examines heterogeneous effects with regard to several observable characteristics chosen. Firstly, we exclude federal states from East Germany from the analysis. As the levy scheme has only been introduced in federal states of West Germany, it is conceivable that the effects are driven by different retention rates between West and East Germany. Model 1 demonstrates that the hazard ratio stays significant and decreases only slightly. In Models 2-4, we test whether some small groups drive the treatment effect. Male graduates, graduates with university entrance qualification, and graduates working in ambulatory nursing services only account for a small share in our sample. If we exclude those groups, the treatment effect increases in Models 2 and 3. Only if we exclude graduates working in ambulatory nursing services, the treatment effect slightly decreases but is still significant different from zero.
Finally, we find that the treatment effect varies by size of the care facility. If we split the size by the median of the number of employees per care facility (86.5), we show that the treatment effect is significant in both small and large care facilities. However, the relationship between the levy scheme and the mobility patterns of graduates is stronger in small care facilities.

Table 4 Heterogeneous effects
Stars denote significance of hazard ratios (HR) or average marginal effect (AME): * p < 10% , * * p < 5% , * * * p < 1% ; standard errors clustered at the level of federal states are in parentheses. This table estimates the effect of the levy scheme on the probability of leaving the training facility (parameter β ) according to Eq. (1). Each estimation controls for covariates summarized in Table 1. Estimates are performed by the Cox hazard model and by logit regressions. Estimations are stratified with respect to several chosen characteristics. In Model 1, we excluded graduates who reside in federal states of East Germany. In Model 2, the estimation exclusively considers female graduates and in Model 3, estimation is restricted to persons who graduated from lower secondary school or secondary school before the apprenticeship training. In Model 4, estimation purely considers graduates of inpatient care facilities and excludes graduates from ambulatory nursing services. Finally, Models 5 and 6 compare the treatment effect between small and large care facilities, while the median of the number of employees per facility (86.5) determines this categorization. The effect of mobility on wages The previous sections demonstrate that the introduction of the levy scheme significantly affected the decision to employ graduates upon completion of apprenticeship training. The F statistic of Model 3 of Table 2 of 74.6 indicates that the levy scheme can be used as strong instrument for mobility patterns. In this section, we take advantage of this first stage and estimate Eq. (2). As the first stage, we take the estimated probability of leaving the training facility according to Model 3 of Table 2 and estimate the effect of this probability on earnings in the second stage. It should be noted that the SIAB only gives the daily wage of full-time employed persons. This is the group of focus in the following.
Panel a of Figure 2 displays the wage distribution of graduates that left the training facility and graduates that stuck to the training facility. 12 Without accounting for the endogeneity in the decision whether to leave the training facility or not, facility switchers appear to obtain lower wages than stayers (difference in means: €6.5). As opposed to this, Panel b illustrates that graduates that begun their apprenticeship training under the regime of the levy scheme, obtain higher wages than graduates in the control group. However, the difference in means is only small (€2.7). Table 5 tests whether the mobility of graduates affects the wages after conditioning on covariates and after instrumenting the endogeneity of the moving pattern.
Model 1 of Table 5 presents results of estimating the effect of the probability of leaving the training facility on wages. When we control for characteristics of graduates and care facilities and for the (lagged) number of school graduates in counties (see Table 2), Model 1 of Table 5 shows a negative coefficient which is not significant different from zero. Throughout the Models 2-4, we use binary information about wages. E.g. in Model 2, we estimate the effect of a facility switch on the probability Fig. 2 The wage distribution by moving pattern and treatment status. a Displays the wage distribution (kernel density) of graduates that leave the training facility after their graduation within the first 100 days after graduation and of graduates that do not leave their training facility within the first 100 days after graduation. In b, the wage distribution is compared between graduates who started their apprenticeship training under the regime of the levy scheme (treatment group) and graduates who started their apprenticeship training when the levy scheme has not been introduced (yet). Source: Sample of Integrated Labour Market Biographies 1975 -2017 (SIAB 7517v1); Regional Database of the Federal Statistical Office and the Federal Employment Agency, own illustration to earn a wage that exceeds the first quartile of the wage distribution. While no significant effect on the probability to earn a wage that exceeds the first quartile or the median of the wage distribution is found, a negative effect is found on the probability to earn at least a wage that exceeds the third quartile of the wage distribution, which is weakly significant. Also, the effect of a facility switch on the differential between wages after graduation and salary during apprenticeship training is negatively affected. However, again, the effect is only weakly significant.

Conclusion
Although the relationship between apprenticeship costs and the decision of whether firms further employ graduates after the completion of apprenticeship training is very relevant, evidence on the causal effect of apprenticeship costs is barely available. This paper tackles this research gap by a case study on geriatric nurses in Germany.
The German labor market of geriatric nurses is of particular relevance for research on vocational education because this sector is characterized by large labor supply shortage. In addition, information about apprenticeship costs and about the motivation of providing apprenticeship training in this sector is close to non-existence. Furthermore, the focus on apprentices in geriatric nursing in Germany enables to consider a unique quasi-experiment.
We consider the introduction of a training levy that redistributes a substantial part of apprenticeship costs between care facilities that provide training for (potential) geriatric nurses and facilities that do not. We take advantage of the fact that the underlying apprenticeship levy was introduced across the federal states at different points in time. By considering this exogenous reduction in apprenticeship costs, we are able to remove substantial part of endogeneity in the relationship between apprenticeship costs and retention of graduates. Furthermore, this underlines the role Table 5 The effect of mobility on wages Stars denote significance of coefficients: * p < 10% , * * p < 5% , * * * p < 1% ; standard errors clustered at the level of federal states are in parentheses.
This table provides results from estimating the effect of leaving the training facility on wages. To perform a two-step instrumental variables (IV) estimation, we take the estimated probability of leaving the training care facility ( mobility ijt ) from estimating Eq. (1). In the second step, we estimate the effect of this estimated probability on earnings wage ijt in Model (1) and on the differential of wages and apprentices' salary during apprenticeship training in Model (5). In the Models 2-4, we use binary information about wages. E.g. in Model 2, we estimate the effect of a facility switch on the probability to earn a wage that exceeds the first quartile of the wage distribution of the institutional setting in the context of the motivation to train of firms and in the context of the mobility of graduates from apprenticeship training. We find that the introduction of the levy scheme increases the probability that graduates leave their employer after completing apprenticeship training by 10 percentage points. Further analyses demonstrate that this effect is not driven by simultaneous reforms regarding the German care sector and by general trends in the apprenticeship market of care workers. Furthermore, the effect of the levy scheme on graduates' mobility is above average in small care facilities, while it is below average but significant in large care facilities.
We interpret this finding in the sense that a significant part of care facilities in Germany follow the production approach by Lindley (1975). This means that these care facilities see apprentices in geriatric nursing mainly as cheap labor input and follow the strategy of covering apprenticeship costs by benefits during apprenticeship training. Although we cannot explicitly identify the motivation to train for each care facility included in our sample and although there is no data set available that captures the costs and benefits of apprenticeship training in the care sector, the consideration of our quasi-experiment helps to draw some first conclusion about the motivation to train of care facilities.
To draw a more complete image about the effect of the levy scheme, we also apply instrumental variable regression in order to estimate the effect of mobility of graduates on their wages. By doing this, we provide a novel approach to the literature strand that considers wages and the endogeneity of the mobility of graduates (e.g., Wagner and Wolf 2013;Fitzenberger et al. 2015;Mohrenweiser and Zwick 2015;Dummert 2020). This novel approach is an institutional approach how to solve the issue that mobility patterns of graduates are endogenous and selective. We find some indications of a negative relationship between mobility and wages, whereas leaving the training facility decreases the probability to earn a wage in the top quartile of the wage distribution. A linear relationship between mobility of graduates and their wages is not found. This can be explained by the features of the labor market of geriatric nurses in Germany. Because there exists a huge labor demand excess for geriatric nurses in Germany and the majority of wages are set due to collective agreements, the variation in wages is low. Table 6 The balance of the treatment: The probability of being in the treatment group for the prereform periods by characteristics of graduates, care facilities, and regions Stars denote significance of coefficients: * p < 10% , * * p < 5% , * * * p < 1% ; standard errors clustered at the level of federal states are in parentheses.
This table uses pooled OLS and regresses a binary treatment variable on explaining variables for the pre-reform periods. It should be noted that the dependent variable equals one if the graduate resides in one federal state that has not already but will later introduce the levy scheme. The treatment indicator thus equals zero if the graduate resides in one federal state that will never introduce the levy scheme. The table controls for variables summarized in Table 1