Reputation: 8247
I am working with R Shiny for some exploratory data analysis. I have two checkbox inputs that contain only the user-selected options. The first checkbox input contains only the categorical variables; the second checkbox contains only numeric variables. Next, I apply a groupby
on these two selections:
var1 <- input$variable1 # Checkbox with categorical variables
var2 <- input$variable2 # Checkbox with numerical variables
v$data <- dataset %>%
group_by_(var1) %>%
summarize_(Sum = interp(~sum(x), x = as.name(var2))) %>%
arrange(desc(Sum))
When only one categorical variable is selected, this groupby
works perfectly. When multiple categorical variables are chosen, this groupby
returns an array with column names. How do I pass this array of column names to dplyr
's groupby
?
Upvotes: 49
Views: 61779
Reputation: 206177
With more recent versions of dplyr
, you should use across
along with a tidyselect helper function. See help("language", "tidyselect")
for a list of all the helper functions. In this case if you want all columns in a character vector, use all_of()
cols <- c("mpg","hp","wt")
mtcars %>%
group_by(across(all_of(cols))) %>%
summarize(x=mean(gear))
If you have a vector of variable names, you should pass them to the .dots=
parameter of group_by_
. For example:
mtcars %>%
group_by_(.dots=c("mpg","hp","wt")) %>%
summarize(x=mean(gear))
Upvotes: 66
Reputation: 6278
You can use the helpers from rlang
package, which is created by the same team that created dplyr
. When using dplyr
and other tidyverse packages, you don't have to load the rlang
packages in order to use those helpers.
Specifically, you can use the syms
function and the !!!
function like so:
library(dplyr)
group_cols <- c("vs", "am")
mtcars %>%
group_by(!!!syms(group_cols)) %>%
summarize(mean_wt = mean(wt))
This closely-related question and answer explains how the !!
operator and sym
function are used for a single column name (i.e. a length-one character vector).
Upvotes: 10
Reputation: 1593
With dplyr 1.0.0
, we have the following possibility based on the "normal" group_by
:
library(dplyr)
group_cols <- c("vs", "am")
mtcars %>%
group_by(across(all_of(group_cols))) %>%
summarize(mean_wt = mean(wt))
Upvotes: 11
Reputation: 6278
Recent versions of the dplyr
package include variants of group_by
, such as group_by_if
and group_by_at
. You can use these to perform column selections with syntax that is similar to the select
function.
Just as you could select a list of columns with select(my_data, one_of(group_cols))
, you can use group_by_at
to do the following:
library(dplyr)
group_cols <- c("vs", "am")
mtcars %>%
group_by_at(.vars = vars(one_of(group_cols))) %>%
summarize(mean_wt = mean(wt))
Upvotes: 5