JerryWho
JerryWho

Reputation: 3105

dplyr: getting group_by-column even when not selecting it

When selecting columns I get one column I haven't selected but it's a group_by column:

library(magrittr)
library(dplyr)

df <- data.frame(i=c(1,1,1,1,2,2,2,2), j=c(1,2,1,2,1,2,1,2), x=runif(8))

df %>% 
  group_by(i,j) %>%
  summarize(s=sum(x)) %>%
  filter(i==1) %>%
  select(s)

I get column i even I haven't selected it:

  i         s
1 1 0.8355195
2 1 0.9322474

Why does this happen (why not column j?) and how can I avoid it? Okay I could filter at the beginning....

Upvotes: 3

Views: 1810

Answers (1)

talat
talat

Reputation: 70266

That's because the grouping variable is carried on by default. Please see the dplyr vignette:

Grouping affects the verbs as follows: grouped select() is the same as ungrouped select(), except that grouping variables are always retained.

Note that (each) summarize peels off one layer of grouping (in your case, j), so after the summarize, your data is only grouped by i and that is printed in the output. If you don't want that, you can ungroup the data before selecting s:

require(dplyr)
df %>% 
  group_by(i,j) %>%
  summarize(s=sum(x)) %>%
  ungroup() %>%
  filter(i==1) %>%
  select(s)
#Source: local data frame [2 x 1]
#
#         s
#1 1.129867
#2 1.265131

Upvotes: 5

Related Questions