Reputation: 1
I'd like to summarise columns of a data set for distinct groups defined by yet some other columns. Let me demonstrate:
Fake data:
df <- data.frame(group1 = c(0, 0, 0, 1, 0, 1),
group2 = c(1, 1, 1, 0, 0, 1),
group3 = c(0, 1, 0, 1, 0, 0),
rating = c(3, 5, 0, 2, 1, 2))
So, we've got six observations that might belong to groups 1, 2 and / or 3 (an observation can potentially belong to none, one, two or three groups; a belonging is denoted with a 1), and to each observation belongs a rating ranging from 0 to 5.
I now want to determine the average ratings for members of the three groups. Using dplyr
, I could do that group for group like this:
attach(df)
mean_1 = data.frame(df %>%
filter(group1 == 1) %>%
summarise(mean_rating = mean(rating)))
and so on up until mean_3
, and than after that I'd sorta artificially combine all these results into one big data frame - however, that seems to be extremely unpracticle, especially once you've got far more than just 3 groups.
So my question is: How do you manage to put all these mean_n
-results into one data frame using non-ridiculous amounts of dplyr
-code? Can you work with loops here (my attempts always resulted in errors)? Is the across()
-function the solution (if so, I couldn't find how)?
Thanks for your help!
Upvotes: 0
Views: 34
Reputation: 5747
If you pivot your data to a long form, then you can group_by
and calculate your mean with summarize
library(tidyr)
library(dplyr)
df %>%
pivot_longer(cols = -rating, names_to = "group") %>%
filter(value == 1) %>%
group_by(group) %>%
summarise(mean = mean(rating))
# A tibble: 3 x 2
group mean
<chr> <dbl>
1 group1 2
2 group2 2.5
3 group3 3.5
Upvotes: 1