Albert
Albert

Reputation: 153

Calculate number of years worked with different end dates

Consider the following two datasets. The first dataset describes an id variable that identifies a person and the date when his or her unemployment benefits starts.

The second dataset shows the number of service years, which makes it possible to calculate the maximum entitlement period. More precisely, each year denotes a dummy variable, which is equal to unity in case someone build up unemployment benefits rights in a particular year (i.e. if someone worked). If this is not the case, this variable is equal to zero.

df1<-data.frame( c("R005", "R006", "R007"), c(20120610, 20130115, 20141221))
colnames(df1)<-c("id", "start_UI")

df1$start_UI<-as.character(df1$start_UI)
df1$start_UI<-as.Date(df1$start_UI, "%Y%m%d")

df2<-data.frame( c("R005", "R006", "R007"), c(1,1,1), c(1,1,1), c(0,1,1), c(1,0,1), c(1,0,1) ) 

colnames(df2)<-c("id", "worked2010", "worked2011", "worked2012", "worked2013", "worked2014")

Just to summarize the information from the above two datasets. We see that person R005 worked in the years 2010 and 2011. In 2012 this person filed for Unemployment insurance. Thereafter person R005 works again in 2013 and 2014 (we see this information in dataset df2). When his unemployment spell started in 2012, his entitlement was based on the work history before he got unemployed. Hence, the work history is equal to 2. In a similar vein, the employment history for R006 and R007 is equal to 3 and 5, respectively (for R007 we assume he worked in 2014 as he only filed for unemployment benefits in December of that year. Therefore the number is 5 instead of 4).

Now my question is how I can merge these two datasets effectively such that I can get the following table

df_final<- data.frame(c("R005", "R006", "R007"), c(20120610, 20130115, 20141221), c(2,3,5))
colnames(df_final)<-c("id", "start_UI", "employment_history")

    id start_UI employment_history
1 R005 20120610                  2
2 R006 20130115                  3
3 R007 20141221                  5

I tried using "aggregate", but in that case I also include work history after the year someone filed for unemployment benefits and that is something I do not want. Does anyone have an efficient way how to combine the information from the two above datasets and calculate the unemployment history?

I appreciate any help.

Upvotes: 1

Views: 73

Answers (1)

Ma&#235;l
Ma&#235;l

Reputation: 52249

base R

You should use Reduce with accumulate = T.

df2$employment_history <- apply(df2[,-1], 1, function(x) sum(!Reduce(any, x==0, accumulate = TRUE)))
merge(df1, df2[c("id","employment_history")])

dplyr

Or use the built-in dplyr::cumany function:

df2 %>% 
  pivot_longer(-id) %>% 
  group_by(id) %>% 
  summarise(employment_history = sum(value[!cumany(value == 0)])) %>% 
  left_join(df1, .)

Output

    id   start_UI employment_history
1 R005 2012-06-10                  2
2 R006 2013-01-15                  3
3 R007 2014-12-21                  5

Upvotes: 2

Related Questions