Reputation: 808
Supouse one has a panel data similar to the following, (e.x.) with two variables "dates
" and "sales
":
set.seed(123)
df <- data.frame(
"dates" = c(as.Date("2021-01-01")+0:30,
as.Date("2022-01-01")+0:30),
"sales" = 0:61+(sample(1:30,62,replace = T)))
Notice:
%>%
")My attempts so far:
1st Try:
I've been trying to use dplyr's function "lag()
", but the problem is that it creates a lag but based on the row number rather than the date in "dates
". So it works well but when having time continuos datasets, but not with discontinuous ones:
df <- df %>%
mutate(lag_1day = lag(sales,order_by = dates,1))
See that despite that all continuos dates are fine, when it reaches the discontinuity, it assigns the last day by row and not by date, therefore assigning a value instead of a "NA
"
> df[30:34,]
dates sales lag_1day
30 2021-01-30 52 58
31 2021-01-31 34 52
32 2022-01-01 55 34
33 2022-01-02 60 55
34 2022-01-03 36 60
2nd Try:
Another way I've been using is by creating a copy of the first data frame, then adding to "dates
" the required amount of days for the lag, and finally merging both data frames. It works perfectly (the outcome is the desired result), but I think is not efficient since is not a "one-shot" solution, and it's creating another data frame.
df <- merge(
df,(df %>%
mutate(dates = dates+1) %>%
rename(lag_1day = sales)),
by = "dates", all.x = T)
> df[30:34,]
dates sales lag_1day
30 2021-01-30 59 37
31 2021-01-31 49 59
32 2022-01-01 35 NA
33 2022-01-02 46 35
34 2022-01-03 50 46
Upvotes: 2
Views: 1284
Reputation: 323
You could try using if_else inside the mutate. It looks less clean, and not really efficient timewise, as it would need to make n comparisons to get the answer, but it would be just one pipe.
Not sure about the exact functions but in pseudocode
df <- df %>%
mutate(lag_1day =
ifelse(lag(date,1))+1== date),
lag(sales,1),
NA)
Upvotes: 0