Jammy
Jammy

Reputation: 37

Calculate date difference in cumulative order in R

I have to calculate the date difference in cumulative order in Start_Dur column. I shared the below sample for reference.

My dataset:

Mno     Record_date Fv_Date     
100003  15-05-2004  26-09-2003  
100003  09-12-2009  26-09-2003  
100003  28-04-2012  26-09-2003  
100045  29-01-2011  27-09-2003  
100202  14-11-2006  02-10-2003  
100202  15-10-2009  02-10-2003

Expected Output:

Mno     Record_date Fv_Date     Start_Dur   End_Dur
100003  15-05-2004  26-09-2003  0           0.6352
100003  09-12-2009  26-09-2003  0.6352      6.204
100003  28-04-2012  26-09-2003  6.204       8.5886
100045  29-01-2011  27-09-2003  0           7.3402
100202  14-11-2006  02-10-2003  0           3.1184
100202  15-10-2009  02-10-2003  3.1184      6.037

End_Dur calculated by (Fv_date - Record_date) but the starting point/First row in Start_Dur is 0 in default. the second row of Start_Dur should be the End_Dur of previous row.

Kindly suggest a code to solve the problem. Thanks in advance!

Upvotes: 0

Views: 53

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389235

You can convert the columns to Date class, get the difference between them and divide by 365 to get End_Dur. You can get Start_Dur by taking lag value of End_Dur for each Mno.

library(dplyr)

df %>%
  mutate(across(ends_with('Date'), as.Date, "%d-%m-%Y"), 
         End_Dur = as.numeric((Record_date - Fv_Date)/365)) %>%
  group_by(Mno) %>%
  mutate(Start_Dur = lag(End_Dur, default = 0))

#     Mno Record_date Fv_Date    End_Dur Start_Dur
#   <int> <date>      <date>       <dbl>     <dbl>
#1 100003 2004-05-15  2003-09-26   0.636     0    
#2 100003 2009-12-09  2003-09-26   6.21      0.636
#3 100003 2012-04-28  2003-09-26   8.59      6.21 
#4 100045 2011-01-29  2003-09-27   7.35      0    
#5 100202 2006-11-14  2003-10-02   3.12      0    
#6 100202 2009-10-15  2003-10-02   6.04      3.12 

data

df <- structure(list(Mno = c(100003L, 100003L, 100003L, 100045L, 100202L, 
100202L), Record_date = c("15-05-2004", "09-12-2009", "28-04-2012", 
"29-01-2011", "14-11-2006", "15-10-2009"), Fv_Date = c("26-09-2003", 
"26-09-2003", "26-09-2003", "27-09-2003", "02-10-2003", "02-10-2003"
)), class = "data.frame", row.names = c(NA, -6L))

Upvotes: 2

Related Questions