Reputation: 81
I have the following data (this is a sample, my dataset is quite large):
ID| Date
A | 2010-12-30
A | 2010-12-13
A | 2010-08-23
B | 2011-06-24
B | 2011-06-13
B | 2010-02-20
What I need to do is to calculate the difference between dates, based on the ID. The calculation must start from the first row and subtract the date on the row below.
So for the above data, the desired output would be the DateDiff column below:
ID| Date | DateDiff
A | 2010-12-30 | 17 (which is 2010-12-30 - 2010-12-13)
A | 2010-12-13 | 112 (which is 2010-12-13 - 2010-08-23)
A | 2010-08-23 | 0 (this result should be 0 as the ID (A) does not match the ID below (B)
B | 2011-06-24 | 11 (which is 2011-06-24 - 2011-06-13)
B | 2011-06-13 | 478 (which is 2011-06-13 - 2010-02-20)
B | 2010-02-20 | 0 (this result is 0 again as there is no ID in the next row thus the ID (B) does not match the ID below)
I have used the following code which is close to the desired result:
df$DateDiff <- ave(as.numeric(df$DATE), df$ID, FUN=function(x) c(0,abs(diff(x))))
However, it calculates by subtracting the second date from the first, and therefore entering in 0 in the first row, like so:
ID| Date | DateDiff
A | 2010-12-30 | 0 (as there is no date above)
A | 2010-12-13 | 17
A | 2010-08-23 | 112 (as it calculates the diff between the first date in ID (B) from the last in ID (A)
B | 2011-06-24 | 0
B | 2011-06-13 | 11
B | 2010-02-20 | 478
As you can see, the result is close but not quite there. I have searched long and hard but have not been able to locate a solution.
Upvotes: 1
Views: 1713
Reputation: 1162
It looks to me like your code is perfect, EXCEPT in the function you've created it should be c(abs(diff(x)),0)
instead. i.e. calculate the difference then put a 0 on the end.
Example:
ID <- c("A", "A", "A", "B", "B", "B")
DATE <- as.Date(c("2010-12-30",
"2010-12-13",
"2010-08-23",
"2011-06-2",
"2011-06-13",
"2010-02-20"
))
df <- data.frame(ID, DATE)
df$DateDiff <- ave(as.numeric(df$DATE), df$ID, FUN=function(x) c(abs(diff(x)),0))
And here's the output:
Upvotes: 3
Reputation: 79208
library(data.table)
setDT(data)[,.(Date,c(-diff(Date),0)),by=ID]
ID Date V2
1: A 2010-12-30 17 days
2: A 2010-12-13 112 days
3: A 2010-08-23 0 days
4: B 2011-06-24 11 days
5: B 2011-06-13 478 days
6: B 2010-02-20 0 days
Base R:
transform(data,A=unlist(by(Date,ID,FUN=function(x)c(-diff(x),0))))
ID Date A
1: A 2010-12-30 17
2: A 2010-12-13 112
3: A 2010-08-23 0
4: B 2011-06-24 11
5: B 2011-06-13 478
6: B 2010-02-20 0
Upvotes: 1
Reputation: 13581
You could use dplyr::lead
to shift your Date
by 1
df %>%
group_by(ID) %>%
mutate(DateDiff = abs(dplyr::lead(Date, 1, default=NA) - Date))
# A tibble: 6 x 3
# Groups: ID [2]
# ID Date DateDiff
# <chr> <date> <time>
# 1 A 2010-12-30 17
# 2 A 2010-12-13 112
# 3 A 2010-08-23 <NA>
# 4 B 2011-06-24 11
# 5 B 2011-06-13 478
# 6 B 2010-02-20 <NA>
df <- read.table(text="ID Date
A 2010-12-30
A 2010-12-13
A 2010-08-23
B 2011-06-24
B 2011-06-13
B 2010-02-20", header=TRUE, stringsAsFactors=FALSE)
library(lubridate)
df$Date <- ymd(df$Date)
Upvotes: 1