Reputation: 6479
I found the following code from a published paper. In this specific case, I'd say that the strategy works well as it's clear and there are relatively few variables. However, the code is "a bit" repetitive, and I wonder if there is a less repetitive way to do it, that would still conform to the dplyr
style and way of life.
A test case:
set.seed(42)
df <- data.frame(GR=sample(1:2, 100, replace=TRUE),
as.data.frame(replicate(20, rnorm(100))))
names(df)[-1] <- LETTERS[1:20]
Now table of grouped means using aggregate
:
aggregate(df[,-1], df[1],mean)
... and with dplyr
:
df %>% group_by(GR) %>% summarize(mean.A=mean(A),
mean.B=mean(B),
mean.C=mean(C),
mean.D=mean(D),
mean.E=mean(E),
# skipped 14 rows
mean.T=mean(T))
Is there a DRY way of doing this in dplyr
? I know that all programming tools in R are also available in dplyr
- so the question is not about HOW to do it .. rather, I'm looking for an idiomatic dplyr
way of doing this. I've seen similar but much longer examples in real life.
Upvotes: 2
Views: 1660
Reputation: 1983
How about this:
df %>%
group_by(GR) %>%
summarise_all(.funs = mean) %>%
setNames(paste("mean", colnames(.), sep = "."))
Upvotes: 2
Reputation: 887851
When there are multiple columns to summarise
, use either summarise_all
(if all the other columns needs to be summarised with a function except the grouping variable)
df %>%
group_by(GR) %>%
summarise_all(funs(mean = mean(., na.rm = TRUE)))
If we need to do this only on selected columns, then try with summarise_at
df %>%
group_by(GR) %>%
summarise_at(vars(A, B, C, D, E), funs(mean = mean(., na.rm = TRUE)))
Also, check for summarise_if
when we wanted to apply the function only on certain type
s of columns
Upvotes: 3