Reputation: 779
Does anybody have an explanation for such result using dplyr
package?
I have a data.frame df
library(dplyr)
df = data_frame(
'id' = c(1,2,2,2,2,3,3,3,3),
'start' = c(881, 1611, 1611, 1642, 1764, 0, 0, 28, 59),
'end' = c(1089, 1819, 1819, 1850, 1972, 208, 208,236, 267))
That looks like
# Source: local data frame [9 x 3]
#
# id start end
# (dbl) (dbl) (dbl)
# 1 1 881 1089
# 2 2 1611 1819
# 3 2 1611 1819
# 4 2 1642 1850
# 5 2 1764 1972
# 6 3 0 208
# 7 3 0 208
# 8 3 28 236
# 9 3 59 267
After grouping by id
and applying a lag in end column, I was expecting to have one missing for each id
.
df %>%
group_by(id) %>%
mutate(end.prev = lag(end))
But I have
# Source: local data frame [9 x 4]
# Groups: id [3]
#
# id start end end.prev
# (dbl) (dbl) (dbl) (dbl)
# 1 1 881 1089 NA
# 2 2 1611 1819 NA
# 3 2 1611 1819 1819
# 4 2 1642 1850 1819
# 5 2 1764 1972 1850
# 6 3 0 208 NA
# 7 3 0 208 NA <- I don't understant this NA
# 8 3 28 236 NA <- Neither this one
# 9 3 59 267 NA <- nor this other
I am using the last version available in cran dplyr 0.4.3 (my R version is 3.2.5)
Upvotes: 4
Views: 5779
Reputation: 883
There are multiple problems over time regarding this, first of all it was that after a reload of the environment there could be problems with the overwritten lag()
funktion from stats. So you have to explicitely use dplyr::lag()
sometimes.
But the general problem here is the group_by()
. Problem should be solved after ungroup()
your tbl.
Upvotes: 4
Reputation: 2022
I am using version dplyr
version 1.0.5
and it seems to be working. If the version is not important then maybe just upgrade your dplyr
to latest version.
library(tidyverse)
df = tibble(
'id' = c(1,2,2,2,2,3,3,3,3),
'start' = c(881, 1611, 1611, 1642, 1764, 0, 0, 28, 59),
'end' = c(1089, 1819, 1819, 1850, 1972, 208, 208,236, 267))
df %>%
group_by(id) %>%
mutate(end.prev = lag(end))
#> # A tibble: 9 x 4
#> # Groups: id [3]
#> id start end end.prev
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 881 1089 NA
#> 2 2 1611 1819 NA
#> 3 2 1611 1819 1819
#> 4 2 1642 1850 1819
#> 5 2 1764 1972 1850
#> 6 3 0 208 NA
#> 7 3 0 208 208
#> 8 3 28 236 208
#> 9 3 59 267 236
Created on 2021-04-16 by the reprex package (v2.0.0)
Upvotes: 1