Reputation: 70256
I noticed that when supplying column indices to dplyr::summarize_at
the column to be summarized is determined excluding the grouping column(s). I wonder if that is how it's supposed to be since by this design, using the correct column index depends on whether the summarising column(s) are positioned before or after the grouping columns.
Here's an example:
library(dplyr)
data("mtcars")
# grouping column after summarise columns
mtcars %>% group_by(gear) %>% summarise_at(3:4, mean)
## A tibble: 3 x 3
# gear disp hp
# <dbl> <dbl> <dbl>
#1 3 326.3000 176.1333
#2 4 123.0167 89.5000
#3 5 202.4800 195.6000
# grouping columns before summarise columns
mtcars %>% group_by(cyl) %>% summarise_at(3:4, mean)
## A tibble: 3 x 3
# cyl hp drat
# <dbl> <dbl> <dbl>
#1 4 82.63636 4.070909
#2 6 122.28571 3.585714
#3 8 209.21429 3.229286
# no grouping columns
mtcars %>% summarise_at(3:4, mean)
# disp hp
#1 230.7219 146.6875
# actual third & fourth columns
names(mtcars)[3:4]
#[1] "disp" "hp"
packageVersion("dplyr")
#[1] ‘0.7.2’
Notice how the summarised columns change depending on grouping and position of the grouping column.
Is this the same on other platforms? Is it a bug or a feature?
Upvotes: 24
Views: 9956
Reputation: 47300
with version 0.7.5
this behavior can't be reproduced anymore:
library(dplyr)
mtcars %>% group_by(gear) %>% summarise_at(3:4, mean)
# # A tibble: 3 x 3
# gear disp hp
# <dbl> <dbl> <dbl>
# 1 3 326. 176.
# 2 4 123. 89.5
# 3 5 202. 196.
mtcars %>% group_by(cyl) %>% summarise_at(3:4, mean)
# # A tibble: 3 x 3
# cyl disp hp
# <dbl> <dbl> <dbl>
# 1 4 105. 82.6
# 2 6 183. 122.
# 3 8 353. 209.
Upvotes: 4
Reputation: 2847
@docendodiscimus thanks for pointing this out, because even if this feature was intentional, documentation doesn't explicitly explain this and in my case could be source of errors. Actually, this problem was solved before answering on the other question, and my comment above does it properly with the same logic.
At this moment, possible solution is to provide names instead of indexes. But one is still able to make it using indexes just by adding few symbols .vars = names(.)[3:4]
, like below:
mtcars %>%
group_by(cyl) %>%
summarise_at( .vars = colnames(.)[3:4] , mean)
mtcars %>%
group_by(cyl) %>%
summarise_at( .vars = names(.)[3:4] , mean)
## A tibble: 3 x 3
# cyl disp hp
# <dbl> <dbl> <dbl>
#1 4 105.1364 82.63636
#2 6 183.3143 122.28571
#3 8 353.1000 209.21429
Upvotes: 3