Reputation: 424
Say I have a data frame like this in R:
df <- data.frame(factor1 = c("A","B","B","C"),
factor2 = c("M","F","F","F"),
factor3 = c("0", "1","1","0"),
value = c(23,32,4,1))
I want to get a summary statistic in dplyr
grouped by one variable, like so (but more complicated):
df %>%
group_by(factor1) %>%
summarize(mean = mean(value))
Now I'd like to do this for all factor columns (think 100 factor variables). Is there a way to do this within dplyr? I was also thinking of doing a for
loop over names(df)
but I get the variables as strings and group_by()
doesn't accept strings.
Upvotes: 3
Views: 1631
Reputation: 145775
Just put your data in long form.
library(tidyr)
df %>% gather(key = factor, value = level, -value) %>%
group_by(factor, level) %>%
summarize(mean = mean(value))
# factor level mean
# (chr) (chr) (dbl)
# 1 factor1 A 23.00000
# 2 factor1 B 18.00000
# 3 factor1 C 1.00000
# 4 factor2 F 12.33333
# 5 factor2 M 23.00000
# 6 factor3 0 12.00000
# 7 factor3 1 18.00000
To actually build a loop instead, the Programming with dplyr vignette is the right place to start.
Upvotes: 5