Hong
Hong

Reputation: 117

Conditional lag with dates in R - how to code skip to next date if dates are the same?

I'm stuck on how to conditionally lag a date variable within the same ID group, but skip the next date value and lag the one after if it is the same as the original date. To illustrate,

data have
ID Value Date
a 50 1987-01-01
a 52 1987-01-01
a 33 1989-05-03
b 10 1979-08-28
b 52 1988-02-23
b 45 1988-02-23
b 33 1988-02-23
b 40 1990-05-20

data want
ID Value Date Lead_date
a 50 1987-01-01 1989-05-03
a 52 1987-01-01 1989-05-03
a 33 1989-05-03 NA
b 10 1979-08-28 1988-02-23
b 52 1988-02-23 1990-05-20
b 45 1988-02-23 1990-05-20
b 33 1988-02-23 1990-05-20
b 40 1990-05-20 NA

I'm not sure if there is an elegant way of doing this, or if I need a for loop?

Any advice is much appreciated!

Upvotes: 0

Views: 107

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269634

1) findInterval will find the position of the last Date matching the current one so just add 1 to get the position of the next Date.

library(data.table)

setDT(DF)
DF[, Lead_date := Date[findInterval(Date, Date) + 1], by = ID]

DF
##    ID Value       Date  Lead_date
## 1:  a    50 1987-01-01 1989-05-03
## 2:  a    52 1987-01-01 1989-05-03
## 3:  a    33 1989-05-03       <NA>
## 4:  b    10 1979-08-28 1988-02-23
## 5:  b    52 1988-02-23 1990-05-20
## 6:  b    45 1988-02-23 1990-05-20
## 7:  b    33 1988-02-23 1990-05-20
## 8:  b    40 1990-05-20       <NA>

2) An alternative is to get the position of the match of Date to the unique dates and then take the next unique date:

library(data.table)
setDT(DF)
DF[, Lead_date := { u <- unique(Date); u[match(Date, u) + 1] }, by = ID]

3) Another way is to replace duplicated dates with NA and then use na.locf0 to fill them in taking the next date.

library(data.table)
library(zoo)

setDF(DF)
DF[, Lead_Date := replace(Date, duplicated(Date), NA) |>
                    na.locf0(fromLast = TRUE) |>
                    shift(-1), by = ID]

Note

Lines <- "ID Value Date
a 50 1987-01-01
a 52 1987-01-01
a 33 1989-05-03
b 10 1979-08-28
b 52 1988-02-23
b 45 1988-02-23
b 33 1988-02-23
b 40 1990-05-20"
DF <- read.table(text = Lines, header = TRUE)
DF$Date <- as.Date(DF$Date)

Upvotes: 5

lebatsnok
lebatsnok

Reputation: 6459

do.call(rbind, lapply(split(d, d["ID"]),
                      function(x) {
                        ld <- rle(x$Date)
                        ld$values <- c(ld$values[-1], NA)
                        x$Lead_date <- inverse.rle(ld)
                        x}
                      
))

The data

d <- read.table(text="ID Value Date
a 50 1987-01-01
a 52 1987-01-01
a 33 1989-05-03
b 10 1979-08-28
b 52 1988-02-23
b 45 1988-02-23
b 33 1988-02-23
b 40 1990-05-20", header=TRUE)

Upvotes: 1

Related Questions