Reputation: 2807
I would like to lag a variable in my grouped data with dplyr. I use lag
operator. I found similiar issues like this dplyr lag function returns NAs where someone pointed to https://github.com/tidyverse/dplyr/issues/1540 to some bug Hadley had fixed in 2016. So, I assume it's solved. Why does my lag command still throws NA?
I use R version 3.6.1 and dplyr_0.8.3.
library(tidyverse)
data = data.frame(id=c(1,1,1,2,2,2,3,3,3,4,4,4), time=seq(1:3), x=rep(c(5:8), each=3))
data %>%
group_by(id) %>%
mutate(x_lag = lag(x, n=1, default=NA, order_by=TRUE)) %>%
select(id, time, x, x_lag)
data %>%
group_by(id) %>%
mutate(x_lag = lag(x, n=1, default=NA, order_by=FALSE)) %>%
select(id, time, x, x_lag)
data %>%
group_by(id) %>%
arrange(id) %>%
mutate(x_lag = lag(x, n=1, default=NA, order_by=FALSE)) %>%
select(id, time, x, x_lag)
data %>%
group_by(id) %>%
mutate(x_lag = lag(x, n=1, default=0, order_by=TRUE)) %>%
select(id, time, x, x_lag)
# A tibble: 8 x 4
# Groups: id, time [12]
id time x x_lag
<dbl> <int> <int> <int>
1 1 1 5 NA
2 1 2 5 NA
3 1 3 5 NA
4 2 1 6 NA
5 2 2 6 NA
6 2 3 6 NA
7 3 1 7 NA
8 3 2 7 NA
Upvotes: 1
Views: 76
Reputation: 173858
I think you're just not using the order_by
argument correctly. It's supposed to take another vector, not a TRUE or FALSE. In your use case, you probably don't want to use it at all.
data %>%
group_by(id) %>%
mutate(x_lag = lag(x, n=1, default=0)) %>%
select(id, time, x, x_lag)
#> # A tibble: 12 x 4
#> # Groups: id [4]
#> id time x x_lag
#> <dbl> <int> <int> <dbl>
#> 1 1 1 5 0
#> 2 1 2 5 5
#> 3 1 3 5 5
#> 4 2 1 6 0
#> 5 2 2 6 6
#> 6 2 3 6 6
#> 7 3 1 7 0
#> 8 3 2 7 7
#> 9 3 3 7 7
#> 10 4 1 8 0
#> 11 4 2 8 8
#> 12 4 3 8 8
Upvotes: 2