specify variable names when grouping

Question

I am using dplyr v1.0.2 to manipulate tibbles. I would like to use group_by(), using a function or a regular expression to specify the relevant variable names (the ... argument). The only solution that I've found is clunky. Is there a relatively simple way?

Here is a minimal example that demonstrates the problem:

library(dplyr)
data(iris)
iris[, -(rbinom(1, 1, .5) + 1) ] %>%  # randomly drop "Sepal.Length" or "Sepal.Width"
  group_by(matches("^Sepal\."))

In the third line, I randomly drop one of the two "Sepal" columns. In the last line, I want to group by the remaining "Sepal" column. The problem is that I don't know its name: it could be either "Sepal.Length" or "Sepal.Width." And the group_by() command in the last line doesn't work: it predictably returns a matches() must be used within a *selecting* function error message.

By contrast, this code works, but it is a bit clunky:

iris[, -(rbinom(1, 1, .5) + 1) ]  %>%
  group_by(!!as.name(grep('Sepal', colnames(.), val = TRUE)))

Is there a simpler way to do the grouping on the second line?

Agaz Wani · Accepted Answer

What about using across to select the columns

iris[, -(rbinom(1, 1, .5) + 1) ]  %>%
  group_by(across(starts_with('Sepal')))

# A tibble: 150 x 4
# Groups:   Sepal.Length [35]
   Sepal.Length Petal.Length Petal.Width Species
                            
 1          5.1          1.4         0.2 setosa 
 2          4.9          1.4         0.2 setosa 
 3          4.7          1.3         0.2 setosa 
 4          4.6          1.5         0.2 setosa 
 5          5            1.4         0.2 setosa 
 6          5.4          1.7         0.4 setosa 
 7          4.6          1.4         0.3 setosa 
 8          5            1.5         0.2 setosa 
 9          4.4          1.4         0.2 setosa 
10          4.9          1.5         0.1 setosa 
# … with 140 more rows

specify variable names when grouping

Answers (1)

Related Questions