T. Kreischwurst
T. Kreischwurst

Reputation: 1

How to apply summarise() to members of various distinct groups at once in R using dplyr?

I'd like to summarise columns of a data set for distinct groups defined by yet some other columns. Let me demonstrate:

Fake data:

df <- data.frame(group1 = c(0, 0, 0, 1, 0, 1),
                 group2 = c(1, 1, 1, 0, 0, 1),
                 group3 = c(0, 1, 0, 1, 0, 0),
                 rating = c(3, 5, 0, 2, 1, 2))

So, we've got six observations that might belong to groups 1, 2 and / or 3 (an observation can potentially belong to none, one, two or three groups; a belonging is denoted with a 1), and to each observation belongs a rating ranging from 0 to 5.

I now want to determine the average ratings for members of the three groups. Using dplyr, I could do that group for group like this:

attach(df)
mean_1 = data.frame(df %>%
                   filter(group1 == 1) %>%
                   summarise(mean_rating = mean(rating)))

and so on up until mean_3, and than after that I'd sorta artificially combine all these results into one big data frame - however, that seems to be extremely unpracticle, especially once you've got far more than just 3 groups.

So my question is: How do you manage to put all these mean_n-results into one data frame using non-ridiculous amounts of dplyr-code? Can you work with loops here (my attempts always resulted in errors)? Is the across()-function the solution (if so, I couldn't find how)?

Thanks for your help!

Upvotes: 0

Views: 34

Answers (1)

Ben Norris
Ben Norris

Reputation: 5747

If you pivot your data to a long form, then you can group_by and calculate your mean with summarize

library(tidyr)
library(dplyr)
df %>% 
  pivot_longer(cols = -rating, names_to = "group") %>%
  filter(value == 1) %>%
  group_by(group) %>%
  summarise(mean = mean(rating))
# A tibble: 3 x 2
  group   mean
  <chr>  <dbl>
1 group1   2  
2 group2   2.5
3 group3   3.5

Upvotes: 1

Related Questions