shoorideh
shoorideh

Reputation: 173

Time interval Calculation for duplicated ID's in R

I have a large dataset.

  1. I want to create a column to show the number of days between the start and the end date (from the previous row) for each duplicated ID. For example, for R1, since it is not duplicated, I will not calculate the interval. For R2, first, I need to sort it in an increasing way based on the start date. Then I calculate the number of days between the second earliest start date and the end date from the previous row. Next I continue to calculate the number of days between the third earliest start date and the end date from the second earliest start date, and so on. I want to do it for any other duplicated Ids too.

  2. Then I want to create a new column to calculate the number of days in the same way as part one, for duplicated ID with the same levels of event. I was wondering how I can do that.


ID<-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6")
START<-c("3-4-2013","4-5-2018","4-5-2015","4-6-2011","5-5-2012","1-9-2010","23-4-1999","25-6-2011","3-6-2011","4-5-2014",
    "6-6-2016","5-7-2014","7-7-1990","3-3-1998","4-4-1990","7-8-2014","22-4-1970","23-5-1984")
End<-c("3-4-2014","4-5-2019","5-5-2015","4-6-2013","5-5-2014","1-9-2012","23-4-2010","25-6-2015","3-6-2013","6-5-2014",
    "6-8-2016","5-8-2014","7-9-1990","3-7-1998","4-9-1990","7-12-2014","22-7-1970","23-8-1984")
event<-c("a","b","b","s","s","f","f","b","b","a","a","a","s","c","c","b","m","a")
df<-data.frame(ID,START,End,event)

So the results would be something like this:

ID     START       End     event   Time1                     Time2
1  R1  3-4-2013  3-4-2014     a     NA                        NA
14 R2  3-3-1998  3-7-1998     c     NA                        NA
15 R2  4-4-1990  4-9-1990     c    (4-4-1990)-(3-7-1998)   (4-4-1990)-(3-7-1998)
3  R2  4-5-2015  5-5-2015     b    (4-5-2015)-(4-9-1990)      NA
2  R2  4-5-2018  4-5-2019     b    (4-5-2018)-(5-5-2015)   (4-5-2018)-(5-5-2015)
16 R2  7-8-2014 7-12-2014     b    (7-8-2014)-(4-5-2019)   (7-8-2014)-(4-5-2019)
10 R3  4-5-2014  6-5-2014     a     NA                        NA
4  R3  4-6-2011  4-6-2013     s    (4-6-2011)-(6-5-2014)      NA
5  R3  5-5-2012  5-5-2014     s    (5-5-2012)-(4-6-2013)   (5-5-2012)-(4-6-2013)                    
12 R3  5-7-2014  5-8-2014     a    (5-7-2014)-(5-5-2014)   (5-7-2014)-(6-5-2014)
11 R3  6-6-2016  6-8-2016     a    (6-6-2016)-(5-8-2014)   (6-6-2016)-(5-8-2014)
13 R3  7-7-1990  7-9-1990     s                            (7-7-1990)-(5-5-2014)
6  R4  1-9-2010  1-9-2012     f
7  R4 23-4-1999 23-4-2010     f
8  R4 25-6-2011 25-6-2015     b
9  R4  3-6-2011  3-6-2013     b
17 R5 22-4-1970 22-7-1970     m
18 R6 23-5-1984 23-8-1984     a
> 

Upvotes: 0

Views: 480

Answers (1)

Gopala
Gopala

Reputation: 10473

One way to achieve this is using dplyr package as follows (after fixing your data frame as I show below):

library(dplyr)
df<-data.frame(ID,START,End,event, stringsAsFactors = FALSE)
df$START <- as.Date(df$START, format = '%d-%m-%Y')
df$End <- as.Date(df$End, format = '%d-%m-%Y')
df %>% arrange(ID, START, End) %>% group_by(ID) %>% mutate(laggedTimeElapsed = difftime(START, lag(End), units = 'days'))

Not sure exactly what you want in #2 above, but, if you are trying to create 'event duration' within a given row, you simply do the following:

df %>% arrange(ID, START, End) %>% group_by(ID) %>% mutate(laggedTimeElapsed = difftime(START, lag(End), units = 'days'), eventDuration = difftime(End, START, units = 'days'))

Output here:

Source: local data frame [18 x 6]
Groups: ID [6]

      ID      START        End event laggedTimeElapsed eventDuration
   (chr)     (date)     (date) (chr)            (dfft)        (dfft)
1     R1 2013-04-03 2014-04-03     a           NA days      365 days
2     R2 1990-04-04 1990-09-04     c           NA days      153 days
3     R2 1998-03-03 1998-07-03     c         2737 days      122 days
4     R2 2014-08-07 2014-12-07     b         5879 days      122 days
5     R2 2015-05-04 2015-05-05     b          148 days        1 days
6     R2 2018-05-04 2019-05-04     b         1095 days      365 days
7     R3 1990-07-07 1990-09-07     s           NA days       62 days
8     R3 2011-06-04 2013-06-04     s         7575 days      731 days
9     R3 2012-05-05 2014-05-05     s         -395 days      730 days
10    R3 2014-05-04 2014-05-06     a           -1 days        2 days
11    R3 2014-07-05 2014-08-05     a           60 days       31 days
12    R3 2016-06-06 2016-08-06     a          671 days       61 days
13    R4 1999-04-23 2010-04-23     f           NA days     4018 days
14    R4 2010-09-01 2012-09-01     f          131 days      731 days
15    R4 2011-06-03 2013-06-03     b         -456 days      731 days
16    R4 2011-06-25 2015-06-25     b         -709 days     1461 days
17    R5 1970-04-22 1970-07-22     m           NA days       91 days
18    R6 1984-05-23 1984-08-23     a           NA days       92 days

Upvotes: 1

Related Questions