Reputation: 3105
When selecting columns I get one column I haven't selected but it's a group_by column:
library(magrittr)
library(dplyr)
df <- data.frame(i=c(1,1,1,1,2,2,2,2), j=c(1,2,1,2,1,2,1,2), x=runif(8))
df %>%
group_by(i,j) %>%
summarize(s=sum(x)) %>%
filter(i==1) %>%
select(s)
I get column i even I haven't selected it:
i s
1 1 0.8355195
2 1 0.9322474
Why does this happen (why not column j?) and how can I avoid it? Okay I could filter at the beginning....
Upvotes: 3
Views: 1810
Reputation: 70266
That's because the grouping variable is carried on by default. Please see the dplyr
vignette:
Grouping affects the verbs as follows: grouped
select()
is the same as ungroupedselect()
, except that grouping variables are always retained.
Note that (each) summarize
peels off one layer of grouping (in your case, j
), so after the summarize
, your data is only grouped by i
and that is printed in the output. If you don't want that, you can ungroup the data before selecting s
:
require(dplyr)
df %>%
group_by(i,j) %>%
summarize(s=sum(x)) %>%
ungroup() %>%
filter(i==1) %>%
select(s)
#Source: local data frame [2 x 1]
#
# s
#1 1.129867
#2 1.265131
Upvotes: 5