Looping and applying the same dplyr function to many columns

Question

Say I have a data frame like this in R:

df <- data.frame(factor1 = c("A","B","B","C"),
                factor2 = c("M","F","F","F"),
                factor3 = c("0", "1","1","0"),
                value = c(23,32,4,1))

I want to get a summary statistic in dplyr grouped by one variable, like so (but more complicated):

df %>% 
    group_by(factor1) %>% 
    summarize(mean = mean(value))

Now I'd like to do this for all factor columns (think 100 factor variables). Is there a way to do this within dplyr? I was also thinking of doing a for loop over names(df) but I get the variables as strings and group_by() doesn't accept strings.

Gregor Thomas · Accepted Answer

Just put your data in long form.

library(tidyr)
df %>% gather(key = factor, value = level, -value) %>%
    group_by(factor, level) %>%
    summarize(mean = mean(value))

#    factor level     mean
#     (chr) (chr)    (dbl)
# 1 factor1     A 23.00000
# 2 factor1     B 18.00000
# 3 factor1     C  1.00000
# 4 factor2     F 12.33333
# 5 factor2     M 23.00000
# 6 factor3     0 12.00000
# 7 factor3     1 18.00000

To actually build a loop instead, the Programming with dplyr vignette is the right place to start.

Looping and applying the same dplyr function to many columns

Answers (1)

Related Questions