Reputation: 159
I want to run the same commands (basically some general filter(), mutate() summarise() etc.) over a bunch of variables with the group_by() command. Is there a way to bundle this process instead of having to use a block of code for each individual variable? I've tried using a for loop, which unfortunately didn't work for me. My goal is to only have to write one block of code for all group_by variables.
library(dplyr)
df <- mtcars
Test1 <- df %>%
group_by(mpg) %>%
filter(hp > 100) %>%
summarise(N = n())
Test2 <- df %>%
group_by(cyl) %>%
filter(hp > 100) %>%
summarise(N = n())
Test3 <- df %>%
group_by(disp) %>%
filter(hp > 100) %>%
summarise(N = n())
Thank you in advance!
Upvotes: 0
Views: 469
Reputation: 5254
You can create the vector of variables you want to group by and iterate over that using purrr::map()
and use dplyr::count()
to succinctly group_by
& summarize()
. One key thing here is to surround the first map()
statement with {}
curly braces to allow you to pipe the .
into a something other than the first argument. The other key thing here is that if you want to filter by hp > 100
, do that upstream to avoid repeating it. Finally I pivot_longer()
so that the variable name in each output can be combined into a single data.frame
with the same column names.
Also note that for this example I chose different variables that work better as a grouping keys because they are basically discrete variables.
library(tidyverse)
varz <- c("cyl", "am", "gear", "carb")
mtcars %>%
filter(hp > 100) %>%
{map(varz, \(x){count(., !!sym(x))})} %>%
map(~pivot_longer(.x, -n, names_to = "var")) %>%
bind_rows() %>%
select(var, value, n)
#> # A tibble: 14 x 3
#> var value n
#> <chr> <dbl> <int>
#> 1 cyl 4 2
#> 2 cyl 6 7
#> 3 cyl 8 14
#> 4 am 0 16
#> 5 am 1 7
#> 6 gear 3 14
#> 7 gear 4 5
#> 8 gear 5 4
#> 9 carb 1 2
#> 10 carb 2 6
#> 11 carb 3 3
#> 12 carb 4 10
#> 13 carb 6 1
#> 14 carb 8 1
Created on 2022-02-18 by the reprex package (v2.0.1)
Upvotes: 0
Reputation: 119
Since you already use a tidyverse
approach you could just pivot the table:
library(tidyverse)
df <- mtcars
df %>%
filter(hp > 100) %>%
select(mpg, cyl, hp) %>%
pivot_longer(cols = c(mpg, cyl, hp),
names_to = "var",
values_to = "value") %>%
group_by(var) %>%
summarise(N = n())
#> # A tibble: 3 × 2
#> var N
#> <chr> <int>
#> 1 cyl 23
#> 2 hp 23
#> 3 mpg 23
Would that solve your problem? This way you can put any functions after the pivoted dataset and apply them to all variables as you wish. You can pivot it back if you prefer to have them as variables again. The main advantage of this approach is that you never split your dataset. However, there are downsides to this as well.
Another option could be using purrr
.
Upvotes: -1
Reputation: 125697
One option would be to store the variable names in a vector or list, then loop over this list using e.g. lapply
where I make use of the .data
pro-noun:
library(dplyr)
df <- mtcars
var_list <- c("mpg", "cyl", "disp")
lapply(var_list, function(x) {
df %>%
group_by(.data[[x]]) %>%
filter(hp > 100) %>%
summarise(N = n())
})
#> [[1]]
#> # A tibble: 18 × 2
#> mpg N
#> <dbl> <int>
#> 1 10.4 2
#> 2 13.3 1
#> 3 14.3 1
#> 4 14.7 1
#> 5 15 1
#> 6 15.2 2
#> 7 15.5 1
#> 8 15.8 1
#> 9 16.4 1
#> 10 17.3 1
#> 11 17.8 1
#> 12 18.1 1
#> 13 18.7 1
#> 14 19.2 2
#> 15 19.7 1
#> 16 21 2
#> 17 21.4 2
#> 18 30.4 1
#>
#> [[2]]
#> # A tibble: 3 × 2
#> cyl N
#> <dbl> <int>
#> 1 4 2
#> 2 6 7
#> 3 8 14
#>
#> [[3]]
#> # A tibble: 18 × 2
#> disp N
#> <dbl> <int>
#> 1 95.1 1
#> 2 121 1
#> 3 145 1
#> 4 160 2
#> 5 168. 2
#> 6 225 1
#> 7 258 1
#> 8 276. 3
#> 9 301 1
#> 10 304 1
#> 11 318 1
#> 12 350 1
#> 13 351 1
#> 14 360 2
#> 15 400 1
#> 16 440 1
#> 17 460 1
#> 18 472 1
EDIT To store the results in a dataframe I would first name the vector of variables, inside the function use the same name for the grouping column. i.e. I simply used "value" then use bind_rows to bind the results together where as an identifier I added the variable name:
var_list <- c("mpg", "cyl", "disp")
names(var_list) <- var_list
lapply(var_list, function(x) {
df %>%
group_by(value = .data[[x]]) %>%
filter(hp > 100) %>%
summarise(N = n())
}) %>%
bind_rows(.id = "var")
#> # A tibble: 39 x 3
#> var value N
#> <chr> <dbl> <int>
#> 1 mpg 10.4 2
#> 2 mpg 13.3 1
#> 3 mpg 14.3 1
#> 4 mpg 14.7 1
#> 5 mpg 15 1
#> 6 mpg 15.2 2
#> 7 mpg 15.5 1
#> 8 mpg 15.8 1
#> 9 mpg 16.4 1
#> 10 mpg 17.3 1
#> # ... with 29 more rows
Upvotes: 0