asachet
asachet

Reputation: 6921

complex column selection in dplyr group_by

I would like to use, within a group_by call, dplyr's column selectors like starts_with(), ends_with(), matches(), ..., or even the syntax -colName.

(Silly) example of the syntax I am after:

library("dplyr")

# I would like to do something like this
mtcars %>% 
   group_by(matches("a")) %>%
   summarise(mpg=mean(mpg))
# but I get a "wrong result size" error

I was hoping it would work, by analogy with:

mtcars %>% select(matches("a"))

which here would select columns drat, am, gear, carb

To be crystal clear: I want to use matches("a") (or equivalent) to achieve the same output as:

mtcars %>% 
group_by(drat, am, gear, carb) %>%
summarise(mpg=mean(mpg))

I am only interested in answers using dplyr. Thanks!


The current answer, while good, only allows selecting columns with a regex.

I am still looking for a more global answer that would allow the use of the full range of dplyr's selection syntax. Of course I can massage any regex to select what I want, but I wish I had something which integrates nicer with dplyr (especially to use the -colName syntax). I am going to leave this opened for a while.

Upvotes: 5

Views: 714

Answers (2)

asachet
asachet

Reputation: 6921

group_by_at was added to dplyr some time in 2017 and does just that.

mtcars %>% 
   group_by_at(matches("a")) %>%
   summarise(mpg=mean(mpg))

Upvotes: 0

akuiper
akuiper

Reputation: 214987

Here is an option to construct your own group_at() which I don't think exists with the matches and SE group_by_() function:

mtcars %>% 
      group_by_(.dots = names(mtcars)[matches("a", vars = names(mtcars))]) %>%
      summarise(mpg = mean(mpg))

#Source: local data frame [26 x 5]
#Groups: drat, am, gear [?]

#    drat    am  gear  carb   mpg
#   <dbl> <dbl> <dbl> <dbl> <dbl>
#1   2.76     0     3     1 18.10
#2   2.76     0     3     2 15.50
#3   2.93     0     3     4 10.40
#4   3.00     0     3     4 10.40
#5   3.07     0     3     3 16.30
#6   3.08     0     3     1 21.40
#7   3.08     0     3     2 19.20
#8   3.15     0     3     2 16.95
#9   3.21     0     3     4 14.30
#10  3.23     0     3     4 14.70
# ... with 16 more rows

Or equivalently, just use grep:

mtcars %>% 
      group_by_(.dots = grep('a', names(mtcars), value = TRUE)) %>%
      summarise(mpg=mean(mpg))

Upvotes: 5

Related Questions