Add a column to data frame n days before an observation

Question

I need an more efficient way to add a marker that shows that an observation is registered 3 days before a specific date. The problem is that these dates are not necessarily consecutive i.e. they can be missing, yet I need the marker to ignore the missing days. The example below illustrates the problem and what I need more clearly:

library(tidyverse)
library(lubridate)

df <- data.frame("Date" = c(as_date(0:9)), ID = rep(paste0("ID", 1:3), each = 10))
df <- df[-c(5, 13, 24),]

date_before <- "1970-01-07"

df[, "three_days_before"] <- 0

for(i in df$ID){

  cond <- df[, "ID"] == i & 
    df[, "Date"] == date_before

  before_n <- (which(cond)-3):(which(cond)-1)

  df[before_n, "three_days_before"] <- 1

}

df

The loop gives me what I need (three days are marked each time regardless their inclusion in the data.frame), yet it takes quite a long time to calculate on a larger data set. Can someone recommend a better way?

G. Grothendieck · Accepted Answer

1) Apply a rolling window separately for each ID. The rolling window function checks whether any of the next 3 elements of Date equal date_before. (Specifying a width of list(1:3) says to use offsets 1, 2 and 3 which means the next 3 ahead.) Note that there are no next 3 elements for the last value so we use fill to fill it in. We add 0 to convert from logical to numeric. This solution involves only two lines of code and no explicit looping.

library(zoo)

roll <- function(x) rollapply(x, list(1:3), FUN = any, partial = TRUE, fill = FALSE)
transform(df, before = ave(Date == date_before, ID, FUN = roll) + 0)

giving:

         Date  ID before
1  1970-01-01 ID1      0
2  1970-01-02 ID1      0
3  1970-01-03 ID1      1
4  1970-01-04 ID1      1
6  1970-01-06 ID1      1
7  1970-01-07 ID1      0
8  1970-01-08 ID1      0
9  1970-01-09 ID1      0
10 1970-01-10 ID1      0
11 1970-01-01 ID2      0
12 1970-01-02 ID2      0
14 1970-01-04 ID2      1
15 1970-01-05 ID2      1
16 1970-01-06 ID2      1
17 1970-01-07 ID2      0
18 1970-01-08 ID2      0
19 1970-01-09 ID2      0
20 1970-01-10 ID2      0
21 1970-01-01 ID3      0
22 1970-01-02 ID3      0
23 1970-01-03 ID3      1
25 1970-01-05 ID3      1
26 1970-01-06 ID3      1
27 1970-01-07 ID3      0
28 1970-01-08 ID3      0
29 1970-01-09 ID3      0
30 1970-01-10 ID3      0

2) This could also be expressed as a pipeline where roll is from above:

library(dplyr)
library(zoo)

df %>%
   group_by(ID) %>%
   mutate(before = roll(Date == date_before)) %>%
   ungroup

Add a column to data frame n days before an observation

Answers (2)

Sample data

Related Questions