Reputation: 9763
Can someone explain what I am doing wrong here:
library(dplyr)
temp<-data.frame(a=c(1,2,3,1,2,3,1,2,3),b=c(1,2,3,1,2,3,1,2,3))
temp%>%group_by(temp[,1])%>%summarise(n=n(),mean=mean(temp[,2],na.rm=T))
# A tibble: 3 × 3
`temp[, 1]` n mean
<dbl> <int> <dbl>
1 1 3 2
2 2 3 2
3 3 3 2
I expected the means to be:
1 1
2 2
3 3
instead the mean seems to be the global mean (all values in col 2 divided by the number of instances) = 18/9=2
How do I get the mean to be what I expected?
Upvotes: 2
Views: 255
Reputation: 1703
Your problem is that you are calculating the mean of temp[,2]
instead of the column in the group (mean(temp[,2],na.rm=T)
does not depend on the context at all). You need to do the following:
> temp %>% group_by(temp[,1]) %>% summarise(n=n(), mean=mean(b, na.rm=T))
# A tibble: 3 × 3
`temp[, 1]` n mean
<dbl> <int> <dbl>
1 1 3 1
2 2 3 2
3 3 3 3
Furthermore it is more common to use the column name in the group_by
as well:
> temp %>% group_by(b) %>% summarise(n=n(), mean=mean(b, na.rm=T))
# A tibble: 3 × 3
b n mean
<dbl> <int> <dbl>
1 1 3 1
2 2 3 2
3 3 3 3
Upvotes: 3
Reputation: 1
Always remember to use column names in dplyr
. you will run into problems like these when you try to reference column by their index rather than name. so instead of the code you used
temp%>%group_by(temp[,1])%>%summarise(n=n(),mean=mean(temp[,2],na.rm=T))
Try the below this. gives the expected result
temp%>%group_by(b)%>%summarise(n=n(),mean=mean(b))
Upvotes: 0
Reputation: 887223
An alternative approach is data.table
library(data.table)
setDT(temp)[, .(n = .N, mean = mean(b)), by = a]
# a n mean
#1: 1 3 1
#2: 2 3 2
#3: 3 3 3
Upvotes: 1