talat
talat

Reputation: 70256

Using dplyr summarise_at with column index

I noticed that when supplying column indices to dplyr::summarize_at the column to be summarized is determined excluding the grouping column(s). I wonder if that is how it's supposed to be since by this design, using the correct column index depends on whether the summarising column(s) are positioned before or after the grouping columns.

Here's an example:

library(dplyr)
data("mtcars")

# grouping column after summarise columns
mtcars %>% group_by(gear) %>% summarise_at(3:4, mean)
## A tibble: 3 x 3
#   gear     disp       hp
#  <dbl>    <dbl>    <dbl>
#1     3 326.3000 176.1333
#2     4 123.0167  89.5000
#3     5 202.4800 195.6000

# grouping columns before summarise columns
mtcars %>% group_by(cyl) %>% summarise_at(3:4, mean)
## A tibble: 3 x 3
#    cyl        hp     drat
#  <dbl>     <dbl>    <dbl>
#1     4  82.63636 4.070909
#2     6 122.28571 3.585714
#3     8 209.21429 3.229286

# no grouping columns
mtcars %>% summarise_at(3:4, mean)
#      disp       hp
#1 230.7219 146.6875

# actual third & fourth columns
names(mtcars)[3:4]
#[1] "disp" "hp"  

packageVersion("dplyr")
#[1] ‘0.7.2’

Notice how the summarised columns change depending on grouping and position of the grouping column.

Is this the same on other platforms? Is it a bug or a feature?

Upvotes: 24

Views: 9956

Answers (2)

moodymudskipper
moodymudskipper

Reputation: 47300

with version 0.7.5 this behavior can't be reproduced anymore:

  library(dplyr)
  mtcars %>% group_by(gear) %>% summarise_at(3:4, mean)
  # # A tibble: 3 x 3
  #    gear  disp    hp
  #   <dbl> <dbl> <dbl>
  # 1     3  326. 176. 
  # 2     4  123.  89.5
  # 3     5  202. 196. 

  mtcars %>% group_by(cyl) %>% summarise_at(3:4, mean)
  # # A tibble: 3 x 3
  #     cyl  disp    hp
  #   <dbl> <dbl> <dbl>
  # 1     4  105.  82.6
  # 2     6  183. 122. 
  # 3     8  353. 209. 

Upvotes: 4

GoGonzo
GoGonzo

Reputation: 2847

@docendodiscimus thanks for pointing this out, because even if this feature was intentional, documentation doesn't explicitly explain this and in my case could be source of errors. Actually, this problem was solved before answering on the other question, and my comment above does it properly with the same logic.


At this moment, possible solution is to provide names instead of indexes. But one is still able to make it using indexes just by adding few symbols .vars = names(.)[3:4], like below:

mtcars %>% 
  group_by(cyl) %>% 
  summarise_at( .vars = colnames(.)[3:4] , mean)

mtcars %>% 
  group_by(cyl) %>% 
  summarise_at( .vars = names(.)[3:4] , mean)


## A tibble: 3 x 3
#    cyl     disp        hp
#  <dbl>    <dbl>     <dbl>
#1     4 105.1364  82.63636
#2     6 183.3143 122.28571
#3     8 353.1000 209.21429

Upvotes: 3

Related Questions