Python Newbie
Python Newbie

Reputation: 277

Finding Difference Between Dates in Long format

I have a dataset in long form for start and end date. for each id you will see multiple start and end dates. I need to find the difference between the first end date and second start date. I am not sure how to use two rows to calculate the difference. Any help is appreciated.

df=data.frame(c(1,2,2,2,3,4,4),
              as.Date(c( "2010-10-01","2009-09-01","2014-01-01","2014-02-01","2009-01-01","2013-03-01","2014-03-01")),
              as.Date(c("2016-04-30","2013-12-31","2014-01-31","2016-04-30","2014-02-28","2013-05-01","2014-08-31"))); 
names(df)=c('id','start','end')

my output would look like this:

 df$diff=c(NA,1,1,NA,NA,304, NA)

Upvotes: 2

Views: 463

Answers (3)

pmavuluri
pmavuluri

Reputation: 141

Again with base R, you can do the following:

df$noofdays <- as.numeric(as.difftime(df$end-df$start, units=c("days"), format="%Y-%m-%d"))

Upvotes: 0

thelatemail
thelatemail

Reputation: 93813

Here's an attempt in base R that I think does what you want:

df$diff <- NA
split(df$diff, df$id) <- by(df, df$id, FUN=function(SD) c(SD$start[-1], NA) - SD$end)

df
#  id      start        end diff
#1  1 2010-10-01 2016-04-30   NA
#2  2 2009-09-01 2013-12-31    1
#3  2 2014-01-01 2014-01-31    1
#4  2 2014-02-01 2016-04-30   NA
#5  3 2009-01-01 2014-02-28   NA
#6  4 2013-03-01 2013-05-01  304
#7  4 2014-03-01 2014-08-31   NA

Alternatively, in data.table it would be:

setDT(df)[, diff := shift(start,n=1,type="lead") - end, by=id]

Upvotes: 2

lukeA
lukeA

Reputation: 54237

Here's an alternative using the popular dplyr package:

library(dplyr)
df %>% 
  group_by(id) %>%
  mutate(diff = difftime(lead(start), end, units = "days"))
#      id      start        end     diff
#   (dbl)     (date)     (date)   (dfft)
# 1     1 2010-10-01 2016-04-30  NA days
# 2     2 2009-09-01 2013-12-31   1 days
# 3     2 2014-01-01 2014-01-31   1 days
# 4     2 2014-02-01 2016-04-30  NA days
# 5     3 2009-01-01 2014-02-28  NA days
# 6     4 2013-03-01 2013-05-01 304 days
# 7     4 2014-03-01 2014-08-31  NA days

You can wrap diff in as.numeric if you want.

Upvotes: 1

Related Questions