Grouped means in dplyr

Question

I found the following code from a published paper. In this specific case, I'd say that the strategy works well as it's clear and there are relatively few variables. However, the code is "a bit" repetitive, and I wonder if there is a less repetitive way to do it, that would still conform to the dplyr style and way of life.

A test case:

set.seed(42)
df <- data.frame(GR=sample(1:2, 100, replace=TRUE),
       as.data.frame(replicate(20, rnorm(100))))
names(df)[-1] <- LETTERS[1:20]

Now table of grouped means using aggregate:

aggregate(df[,-1], df[1],mean)

... and with dplyr:

df %>% group_by(GR) %>% summarize(mean.A=mean(A),
                                  mean.B=mean(B),
                                  mean.C=mean(C),
                                  mean.D=mean(D),
                                  mean.E=mean(E),
                                  # skipped 14 rows
                                  mean.T=mean(T))

Is there a DRY way of doing this in dplyr? I know that all programming tools in R are also available in dplyr - so the question is not about HOW to do it .. rather, I'm looking for an idiomatic dplyr way of doing this. I've seen similar but much longer examples in real life.

akrun · Accepted Answer

When there are multiple columns to summarise, use either summarise_all (if all the other columns needs to be summarised with a function except the grouping variable)

df %>%
   group_by(GR) %>%
   summarise_all(funs(mean = mean(., na.rm = TRUE)))

If we need to do this only on selected columns, then try with summarise_at

df %>%
   group_by(GR) %>%
   summarise_at(vars(A, B, C, D, E), funs(mean = mean(., na.rm = TRUE)))

Also, check for summarise_if when we wanted to apply the function only on certain types of columns

Grouped means in dplyr

Answers (2)

Related Questions