dplyr: getting group_by-column even when not selecting it

Question

When selecting columns I get one column I haven't selected but it's a group_by column:

library(magrittr)
library(dplyr)

df <- data.frame(i=c(1,1,1,1,2,2,2,2), j=c(1,2,1,2,1,2,1,2), x=runif(8))

df %>% 
  group_by(i,j) %>%
  summarize(s=sum(x)) %>%
  filter(i==1) %>%
  select(s)

I get column i even I haven't selected it:

  i         s
1 1 0.8355195
2 1 0.9322474

Why does this happen (why not column j?) and how can I avoid it? Okay I could filter at the beginning....

talat · Accepted Answer

That's because the grouping variable is carried on by default. Please see the dplyr vignette:

Grouping affects the verbs as follows: grouped select() is the same as ungrouped select(), except that grouping variables are always retained.

Note that (each) summarize peels off one layer of grouping (in your case, j), so after the summarize, your data is only grouped by i and that is printed in the output. If you don't want that, you can ungroup the data before selecting s:

require(dplyr)
df %>% 
  group_by(i,j) %>%
  summarize(s=sum(x)) %>%
  ungroup() %>%
  filter(i==1) %>%
  select(s)
#Source: local data frame [2 x 1]
#
#         s
#1 1.129867
#2 1.265131

dplyr: getting group_by-column even when not selecting it

Answers (1)

Related Questions