K. Fujisaki
K. Fujisaki

Reputation: 1

Get a list of grouped data in R using dplyr::group_by_() and avoiding for loop

I would like to get a list of summarised tibbles obtained by many group_by_ dots in a dataframe.

require(tidyverse)
data(mtcars)    

# create dots of groups for dplyr::group_by_ function
dots1 <- lapply(c("am", "gear"), as.symbol)
dots2 <- lapply(c("am", "carb"), as.symbol)
l <- list(dots1, dots2)

# group_by then summarise for one dots
mtcars %>% 
  group_by_(.dots = dots1) %>% 
  summarise(cyl_mean = mean(mpg),
            cyl_sd = sd(mpg)) 

How to write the code allowing to get a list of n tibbles matching with n dots ? Something using purrr::map() ? I would like to avoid copy-paste the code, since I would used many dots.

I tried

> mtcars %>% group_by_(.dots = l)

but it gave

Error: Can't convert a list to a quosure

I can get the desired output using a for loop, but I was wondering if there is an alternative that do not use a for loop.

list_groups <- list()
for (i in 1:length(l)) {
  res <- mtcars %>%
    group_by_(.dots = l[[i]]) %>%
    summarise(cyl_mean = mean(mpg),
              cyl_sd = sd(mpg))
  list_groups[[i]] <- res
}
list_groups


[[1]]
# A tibble: 4 x 4
# Groups:   am [?]
     am  gear cyl_mean cyl_sd
  <dbl> <dbl>    <dbl>  <dbl>
1  0     3.00     16.1   3.37
2  0     4.00     21.0   3.07
3  1.00  4.00     26.3   5.41
4  1.00  5.00     21.4   6.66

[[2]]
# A tibble: 9 x 4
# Groups:   am [?]
     am  carb cyl_mean cyl_sd
  <dbl> <dbl>    <dbl>  <dbl>
1  0     1.00     20.3   1.93
2  0     2.00     19.3   3.74
3  0     3.00     16.3   1.05
4  0     4.00     14.3   3.36
5  1.00  1.00     29.1   5.06
6  1.00  2.00     27.0   4.30
7  1.00  4.00     19.3   3.00
8  1.00  6.00     19.7 NaN   
9  1.00  8.00     15.0 NaN   

Upvotes: 0

Views: 175

Answers (2)

Nate
Nate

Reputation: 10671

This is how you could do it tidy-eval style using newer library(rlang) functions to help with the non-standard evaluation:

library(tidyverse); library(rlang)

l2 <- list(c("am", "gear") ,c("am", "carb")) %>%
    map(syms) # use syms() to capture multiple symbols instead

map(l, ~ group_by(mtcars, !!!.) %>% # use the !!! to eval the multiple symbols in the mtcars environment
        summarise(cyl_mean = mean(mpg), # back to business as usual
                  cyl_sd = sd(mpg)))

Upvotes: 1

LAP
LAP

Reputation: 6695

You could use a simple lapply approach:

list_groups <- lapply(l, function(x) mtcars %>% 
                        group_by_(.dots = x) %>% 
                        summarise(cyl_mean = mean(mpg),
                                  cyl_sd = sd(mpg)))

[[1]]
# A tibble: 4 x 4
# Groups:   am [?]
     am  gear cyl_mean   cyl_sd
  <dbl> <dbl>    <dbl>    <dbl>
1     0     3 16.10667 3.371618
2     0     4 21.05000 3.069745
3     1     4 26.27500 5.414465
4     1     5 21.38000 6.658979

[[2]]
# A tibble: 9 x 4
# Groups:   am [?]
     am  carb cyl_mean   cyl_sd
  <dbl> <dbl>    <dbl>    <dbl>
1     0     1 20.33333 1.934770
2     0     2 19.30000 3.738449
3     0     3 16.30000 1.053565
4     0     4 14.30000 3.362539
5     1     1 29.10000 5.061620
6     1     2 27.05000 4.300000
7     1     4 19.26667 3.002221
8     1     6 19.70000      NaN
9     1     8 15.00000      NaN

Advantages: You don't need to create the list beforehand, the code is more concise and it is faster.

Unit: milliseconds
             expr      min       lq      mean    median        uq       max neval cld
   lapply_variant 5.130514 5.346550  5.980762  5.515367  5.776913 209.59276  1000  a 
 for_loop_variant 9.298755 9.787714 10.457785 10.064051 10.485171  37.54062  1000   b

Upvotes: 1

Related Questions