Matt
Matt

Reputation: 81

R How to subtract date from next date below and group by?

I have the following data (this is a sample, my dataset is quite large):

ID| Date        
A | 2010-12-30 
A | 2010-12-13 
A | 2010-08-23 
B | 2011-06-24 
B | 2011-06-13 
B | 2010-02-20 

What I need to do is to calculate the difference between dates, based on the ID. The calculation must start from the first row and subtract the date on the row below.

So for the above data, the desired output would be the DateDiff column below:

ID| Date        | DateDiff
A | 2010-12-30  | 17       (which is 2010-12-30 - 2010-12-13) 
A | 2010-12-13  | 112      (which is 2010-12-13 - 2010-08-23) 
A | 2010-08-23  | 0        (this result should be 0 as the ID (A) does not match the ID below (B)   
B | 2011-06-24  | 11       (which is 2011-06-24 - 2011-06-13)
B | 2011-06-13  | 478      (which is 2011-06-13 - 2010-02-20)
B | 2010-02-20  | 0        (this result is 0 again as there is no ID in the next row thus the ID (B) does not match the ID below)

I have used the following code which is close to the desired result:

df$DateDiff <- ave(as.numeric(df$DATE), df$ID, FUN=function(x) c(0,abs(diff(x))))

However, it calculates by subtracting the second date from the first, and therefore entering in 0 in the first row, like so:

ID| Date        | DateDiff
A | 2010-12-30  | 0 (as there is no date above)       
A | 2010-12-13  | 17
A | 2010-08-23  | 112 (as it calculates the diff between the first date in ID (B) from the last in ID (A)  
B | 2011-06-24  | 0
B | 2011-06-13  | 11 
B | 2010-02-20  | 478        

As you can see, the result is close but not quite there. I have searched long and hard but have not been able to locate a solution.

Upvotes: 1

Views: 1713

Answers (3)

LachlanO
LachlanO

Reputation: 1162

It looks to me like your code is perfect, EXCEPT in the function you've created it should be c(abs(diff(x)),0) instead. i.e. calculate the difference then put a 0 on the end.

Example:

ID <- c("A", "A", "A", "B", "B", "B")
DATE <- as.Date(c("2010-12-30",
            "2010-12-13",
            "2010-08-23",
            "2011-06-2",
            "2011-06-13",
            "2010-02-20"
))
df <- data.frame(ID, DATE)

df$DateDiff <- ave(as.numeric(df$DATE), df$ID, FUN=function(x) c(abs(diff(x)),0))

And here's the output:

enter image description here

Upvotes: 3

Onyambu
Onyambu

Reputation: 79208

library(data.table)
setDT(data)[,.(Date,c(-diff(Date),0)),by=ID]
                   ID       Date       V2
1:                 A  2010-12-30  17 days
2:                 A  2010-12-13 112 days
3:                 A  2010-08-23   0 days
4:                 B  2011-06-24  11 days
5:                 B  2011-06-13 478 days
6:                 B  2010-02-20   0 days

Base R:

transform(data,A=unlist(by(Date,ID,FUN=function(x)c(-diff(x),0))))
                   ID       Date   A
1:                 A  2010-12-30  17
2:                 A  2010-12-13 112
3:                 A  2010-08-23   0
4:                 B  2011-06-24  11
5:                 B  2011-06-13 478
6:                 B  2010-02-20   0

Upvotes: 1

CPak
CPak

Reputation: 13581

You could use dplyr::lead to shift your Date by 1

df %>%
  group_by(ID) %>%
  mutate(DateDiff = abs(dplyr::lead(Date, 1, default=NA) - Date))

# A tibble: 6 x 3
# Groups: ID [2]
  # ID    Date       DateDiff
  # <chr> <date>     <time>  
# 1 A     2010-12-30 17      
# 2 A     2010-12-13 112     
# 3 A     2010-08-23 <NA>    
# 4 B     2011-06-24 11      
# 5 B     2011-06-13 478     
# 6 B     2010-02-20 <NA> 

df <- read.table(text="ID Date        
A 2010-12-30 
A 2010-12-13 
A 2010-08-23 
B 2011-06-24 
B 2011-06-13 
B 2010-02-20", header=TRUE, stringsAsFactors=FALSE)
library(lubridate)
df$Date <- ymd(df$Date)

Upvotes: 1

Related Questions