Reputation: 37
I have to calculate the date difference in cumulative order in Start_Dur column. I shared the below sample for reference.
My dataset:
Mno Record_date Fv_Date
100003 15-05-2004 26-09-2003
100003 09-12-2009 26-09-2003
100003 28-04-2012 26-09-2003
100045 29-01-2011 27-09-2003
100202 14-11-2006 02-10-2003
100202 15-10-2009 02-10-2003
Expected Output:
Mno Record_date Fv_Date Start_Dur End_Dur
100003 15-05-2004 26-09-2003 0 0.6352
100003 09-12-2009 26-09-2003 0.6352 6.204
100003 28-04-2012 26-09-2003 6.204 8.5886
100045 29-01-2011 27-09-2003 0 7.3402
100202 14-11-2006 02-10-2003 0 3.1184
100202 15-10-2009 02-10-2003 3.1184 6.037
End_Dur calculated by (Fv_date - Record_date) but the starting point/First row in Start_Dur is 0 in default. the second row of Start_Dur should be the End_Dur of previous row.
Kindly suggest a code to solve the problem. Thanks in advance!
Upvotes: 0
Views: 53
Reputation: 389235
You can convert the columns to Date class, get the difference between them and divide by 365 to get End_Dur
. You can get Start_Dur
by taking lag
value of End_Dur
for each Mno
.
library(dplyr)
df %>%
mutate(across(ends_with('Date'), as.Date, "%d-%m-%Y"),
End_Dur = as.numeric((Record_date - Fv_Date)/365)) %>%
group_by(Mno) %>%
mutate(Start_Dur = lag(End_Dur, default = 0))
# Mno Record_date Fv_Date End_Dur Start_Dur
# <int> <date> <date> <dbl> <dbl>
#1 100003 2004-05-15 2003-09-26 0.636 0
#2 100003 2009-12-09 2003-09-26 6.21 0.636
#3 100003 2012-04-28 2003-09-26 8.59 6.21
#4 100045 2011-01-29 2003-09-27 7.35 0
#5 100202 2006-11-14 2003-10-02 3.12 0
#6 100202 2009-10-15 2003-10-02 6.04 3.12
data
df <- structure(list(Mno = c(100003L, 100003L, 100003L, 100045L, 100202L,
100202L), Record_date = c("15-05-2004", "09-12-2009", "28-04-2012",
"29-01-2011", "14-11-2006", "15-10-2009"), Fv_Date = c("26-09-2003",
"26-09-2003", "26-09-2003", "27-09-2003", "02-10-2003", "02-10-2003"
)), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 2