Reputation: 143
I know this must be super easy, but I'm having trouble finding the right dplyr commands to do this. Let's say I want to group a dataset by two variables, and then summarize the count for each row. For this we simply have:
mtcars %>% group_by(cyl, mpg) %>% summarize(Count = n())
This will generate a dataframe with 27 rows for the three variables cyl
, mpg
, and Count
. What I'd like to do next is summarize the average mpg
for each of the three cyl
values. Keep in mind that each row may contain a Count
greater than one which must be considered when calculating the average. My data frame should have 3 rows of 2 variables cyl
, and Avg_mpg
. Can someone give me the short code chuck that will do this?
Thank you in advance.
Upvotes: 10
Views: 7303
Reputation: 13570
You are effectively performing a simple mean because the weights are the grouping variable:
library(dplyr)
options(pillar.sigfig=10) # To check they are identical results
mtcars %>%
group_by(cyl) %>%
summarise(avg_mpg = mean(mpg))
Output:
The result is identical to the ones proposed above:
# A tibble: 3 x 2
cyl avg_mpg
<dbl> <dbl>
1 4 26.66363636
2 6 19.74285714
3 8 15.1
If you need a weighted mean based on another variable:
mtcars %>%
group_by(cyl) %>%
summarise(avg_mpg = weighted.mean(mpg, disp))
# A tibble: 3 x 2
cyl avg_mpg
<dbl> <dbl>
1 4 25.81985300
2 6 19.77197631
3 8 14.86285148
Upvotes: 2
Reputation: 388807
If I have understood you correctly, you need weighted.mean
library(dplyr)
mtcars %>%
group_by(cyl, mpg) %>%
summarize(Count = n()) %>%
group_by(cyl) %>%
summarise(avg_mpg = weighted.mean(mpg, Count))
# A tibble: 3 x 2
# cyl avg_mpg
# <dbl> <dbl>
#1 4.00 26.7
#2 6.00 19.7
#3 8.00 15.1
which is equivalent to
mtcars %>%
group_by(cyl, mpg) %>%
summarize(Count = n()) %>%
group_by(cyl) %>%
summarise(avg_mpg = sum(mpg * Count)/sum(Count))
Upvotes: 10