Create a lag/laged variable based on a date rather than row number with dplyr in R

Supouse one has a panel data similar to the following, (e.x.) with two variables "dates" and "sales":

  set.seed(123)
  df <- data.frame(
  "dates" = c(as.Date("2021-01-01")+0:30,
              as.Date("2022-01-01")+0:30),
  "sales" = 0:61+(sample(1:30,62,replace = T)))

Notice:

My attempts so far:

1st Try:

I've been trying to use dplyr's function "lag()", but the problem is that it creates a lag but based on the row number rather than the date in "dates". So it works well but when having time continuos datasets, but not with discontinuous ones:

df <- df %>% 
  mutate(lag_1day = lag(sales,order_by = dates,1))

See that despite that all continuos dates are fine, when it reaches the discontinuity, it assigns the last day by row and not by date, therefore assigning a value instead of a "NA"

> df[30:34,]
        dates sales lag_1day
30 2021-01-30    52       58
31 2021-01-31    34       52
32 2022-01-01    55       34
33 2022-01-02    60       55
34 2022-01-03    36       60

2nd Try:

Another way I've been using is by creating a copy of the first data frame, then adding to "dates" the required amount of days for the lag, and finally merging both data frames. It works perfectly (the outcome is the desired result), but I think is not efficient since is not a "one-shot" solution, and it's creating another data frame.

df <- merge(
  df,(df %>% 
        mutate(dates = dates+1) %>% 
        rename(lag_1day = sales)), 
  by = "dates", all.x = T)


> df[30:34,]
        dates sales lag_1day
30 2021-01-30    59       37
31 2021-01-31    49       59
32 2022-01-01    35       NA
33 2022-01-02    46       35
34 2022-01-03    50       46

Upvotes: 2

Views: 1284

Answers (1)

Neoleogeo
Neoleogeo

Reputation: 323

You could try using if_else inside the mutate. It looks less clean, and not really efficient timewise, as it would need to make n comparisons to get the answer, but it would be just one pipe.

Not sure about the exact functions but in pseudocode

df <- df %>% 
  mutate(lag_1day = 
             ifelse(lag(date,1))+1== date), 
                    lag(sales,1), 
                    NA)

Upvotes: 0

Related Questions