Reputation: 1270
We released the package quickpsy a few years ago (paper in the R journal paper). The package used R base functions, but also made an extensive use of functions of what was called at that time the Hadleyverse. We are now developing a new version of the package that mostly uses functions from the tidyverse and that incorporates the new non-standard evaluation approach and found that the package is much much slower (more than four times slower). We found for example that purrr::map is much slower than dplyr::do (which is deprecated):
library(tidyverse)
system.time(
mtcars %>%
group_by(cyl) %>%
do(head(., 2))
)
system.time(
mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(temp = map(data, ~head(., 2))) %>%
unnest(temp)
)
We also found that functions like pull
are very slow.
We are not sure whether the tidyverse is not meant to be used for this type of programming or we are not using it properly.
Upvotes: 4
Views: 1246
Reputation: 1270
For this particular example, the slowness caused by the nest
and unnest
computations can be solved using group_modify
system.time(
mtcars %>%
group_by(cyl) %>%
group_modify(~head(., 2))
)
Upvotes: 0
Reputation: 17642
slice()
is the proper tool to use if you want the first two rows of each group. Both do()
and nest() %>% mutate(map()) %>% unnest()
are too heavy and use more memory:
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(purrr)
library(tidyverse)
system.time(
mtcars %>%
group_by(cyl) %>%
do(head(., 2))
)
#> user system elapsed
#> 0.065 0.003 0.075
system.time(
mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(temp = map(data, ~head(., 2))) %>%
unnest(temp)
)
#> user system elapsed
#> 0.024 0.000 0.024
system.time(
mtcars %>%
group_by(cyl) %>%
slice(1:2)
)
#> user system elapsed
#> 0.002 0.000 0.002
Created on 2018-10-23 by the reprex package (v0.2.1.9000)
See also benchmark results in this tidyr issue
Upvotes: 3