Reputation: 6921
I would like to use, within a group_by
call, dplyr's column selectors like starts_with()
, ends_with()
, matches()
, ..., or even the syntax -colName
.
(Silly) example of the syntax I am after:
library("dplyr")
# I would like to do something like this
mtcars %>%
group_by(matches("a")) %>%
summarise(mpg=mean(mpg))
# but I get a "wrong result size" error
I was hoping it would work, by analogy with:
mtcars %>% select(matches("a"))
which here would select columns drat, am, gear, carb
To be crystal clear: I want to use matches("a")
(or equivalent) to achieve the same output as:
mtcars %>%
group_by(drat, am, gear, carb) %>%
summarise(mpg=mean(mpg))
I am only interested in answers using dplyr. Thanks!
The current answer, while good, only allows selecting columns with a regex.
I am still looking for a more global answer that would allow the use of the full range of dplyr's selection syntax. Of course I can massage any regex to select what I want, but I wish I had something which integrates nicer with dplyr (especially to use the -colName
syntax). I am going to leave this opened for a while.
Upvotes: 5
Views: 714
Reputation: 6921
group_by_at
was added to dplyr some time in 2017 and does just that.
mtcars %>%
group_by_at(matches("a")) %>%
summarise(mpg=mean(mpg))
Upvotes: 0
Reputation: 214987
Here is an option to construct your own group_at()
which I don't think exists with the matches
and SE group_by_()
function:
mtcars %>%
group_by_(.dots = names(mtcars)[matches("a", vars = names(mtcars))]) %>%
summarise(mpg = mean(mpg))
#Source: local data frame [26 x 5]
#Groups: drat, am, gear [?]
# drat am gear carb mpg
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2.76 0 3 1 18.10
#2 2.76 0 3 2 15.50
#3 2.93 0 3 4 10.40
#4 3.00 0 3 4 10.40
#5 3.07 0 3 3 16.30
#6 3.08 0 3 1 21.40
#7 3.08 0 3 2 19.20
#8 3.15 0 3 2 16.95
#9 3.21 0 3 4 14.30
#10 3.23 0 3 4 14.70
# ... with 16 more rows
Or equivalently, just use grep
:
mtcars %>%
group_by_(.dots = grep('a', names(mtcars), value = TRUE)) %>%
summarise(mpg=mean(mpg))
Upvotes: 5