Reputation: 117
I'm stuck on how to conditionally lag a date variable within the same ID group, but skip the next date value and lag the one after if it is the same as the original date. To illustrate,
data have
ID Value Date
a 50 1987-01-01
a 52 1987-01-01
a 33 1989-05-03
b 10 1979-08-28
b 52 1988-02-23
b 45 1988-02-23
b 33 1988-02-23
b 40 1990-05-20
data want
ID Value Date Lead_date
a 50 1987-01-01 1989-05-03
a 52 1987-01-01 1989-05-03
a 33 1989-05-03 NA
b 10 1979-08-28 1988-02-23
b 52 1988-02-23 1990-05-20
b 45 1988-02-23 1990-05-20
b 33 1988-02-23 1990-05-20
b 40 1990-05-20 NA
I'm not sure if there is an elegant way of doing this, or if I need a for loop?
Any advice is much appreciated!
Upvotes: 0
Views: 107
Reputation: 269634
1) findInterval
will find the position of the last Date matching the current one so just add 1 to get the position of the next Date.
library(data.table)
setDT(DF)
DF[, Lead_date := Date[findInterval(Date, Date) + 1], by = ID]
DF
## ID Value Date Lead_date
## 1: a 50 1987-01-01 1989-05-03
## 2: a 52 1987-01-01 1989-05-03
## 3: a 33 1989-05-03 <NA>
## 4: b 10 1979-08-28 1988-02-23
## 5: b 52 1988-02-23 1990-05-20
## 6: b 45 1988-02-23 1990-05-20
## 7: b 33 1988-02-23 1990-05-20
## 8: b 40 1990-05-20 <NA>
2) An alternative is to get the position of the match of Date to the unique dates and then take the next unique date:
library(data.table)
setDT(DF)
DF[, Lead_date := { u <- unique(Date); u[match(Date, u) + 1] }, by = ID]
3) Another way is to replace duplicated dates with NA and then use na.locf0 to fill them in taking the next date.
library(data.table)
library(zoo)
setDF(DF)
DF[, Lead_Date := replace(Date, duplicated(Date), NA) |>
na.locf0(fromLast = TRUE) |>
shift(-1), by = ID]
Lines <- "ID Value Date
a 50 1987-01-01
a 52 1987-01-01
a 33 1989-05-03
b 10 1979-08-28
b 52 1988-02-23
b 45 1988-02-23
b 33 1988-02-23
b 40 1990-05-20"
DF <- read.table(text = Lines, header = TRUE)
DF$Date <- as.Date(DF$Date)
Upvotes: 5
Reputation: 6459
do.call(rbind, lapply(split(d, d["ID"]),
function(x) {
ld <- rle(x$Date)
ld$values <- c(ld$values[-1], NA)
x$Lead_date <- inverse.rle(ld)
x}
))
d <- read.table(text="ID Value Date
a 50 1987-01-01
a 52 1987-01-01
a 33 1989-05-03
b 10 1979-08-28
b 52 1988-02-23
b 45 1988-02-23
b 33 1988-02-23
b 40 1990-05-20", header=TRUE)
Upvotes: 1