lebatsnok
lebatsnok

Reputation: 6479

Grouped means in dplyr

I found the following code from a published paper. In this specific case, I'd say that the strategy works well as it's clear and there are relatively few variables. However, the code is "a bit" repetitive, and I wonder if there is a less repetitive way to do it, that would still conform to the dplyr style and way of life.

enter image description here

A test case:

set.seed(42)
df <- data.frame(GR=sample(1:2, 100, replace=TRUE),
       as.data.frame(replicate(20, rnorm(100))))
names(df)[-1] <- LETTERS[1:20]

Now table of grouped means using aggregate:

aggregate(df[,-1], df[1],mean)

... and with dplyr:

df %>% group_by(GR) %>% summarize(mean.A=mean(A),
                                  mean.B=mean(B),
                                  mean.C=mean(C),
                                  mean.D=mean(D),
                                  mean.E=mean(E),
                                  # skipped 14 rows
                                  mean.T=mean(T))

Is there a DRY way of doing this in dplyr? I know that all programming tools in R are also available in dplyr - so the question is not about HOW to do it .. rather, I'm looking for an idiomatic dplyr way of doing this. I've seen similar but much longer examples in real life.

Upvotes: 2

Views: 1660

Answers (2)

Shinobi_Atobe
Shinobi_Atobe

Reputation: 1983

How about this:

df %>% 
  group_by(GR) %>% 
  summarise_all(.funs = mean) %>% 
  setNames(paste("mean", colnames(.), sep = "."))

Upvotes: 2

akrun
akrun

Reputation: 887851

When there are multiple columns to summarise, use either summarise_all (if all the other columns needs to be summarised with a function except the grouping variable)

df %>%
   group_by(GR) %>%
   summarise_all(funs(mean = mean(., na.rm = TRUE)))

If we need to do this only on selected columns, then try with summarise_at

df %>%
   group_by(GR) %>%
   summarise_at(vars(A, B, C, D, E), funs(mean = mean(., na.rm = TRUE)))

Also, check for summarise_if when we wanted to apply the function only on certain types of columns

Upvotes: 3

Related Questions