Wietze314
Wietze314

Reputation: 6020

r dplyr summarise multiple factors counts

I have a dataframe I want to summarise, using dplyr. In the dataframe there are multiple factors and I want to report the counts of each factor level summarized per group.

Is there a way to do the following using dplyr without having to name each factor level in the summarise statement.

library(dplyr)

set.seed(123)

s <- rbinom(100,1,0.5)
s <- factor(s,0:1,c('M','F'))
a <- sample(1:4,100,TRUE)
a <- factor(a,1:4,c('oldest','old','young','youngest'))
w <- rnorm(100,40,10)
g <- rep(1:2,each=50)

df <- data.frame(sex=s, age=a, weight=w, group=g)



sm <- df %>% group_by(group) %>% summarise(
  male = sum(ifelse(sex=='M',1,0))
  ,female = sum(ifelse(sex=='F',1,0))
  ,youngest = sum(ifelse(age=='youngest',1,0))
  ,young = sum(ifelse(age=='young',1,0))
  ,old = sum(ifelse(age=='old',1,0))
  ,oldest = sum(ifelse(age=='oldest',1,0))
  ,weight = mean(weight)
)

print(t(sm))

result:

        [,1]     [,2]
group     1.000  2.00000
male     29.000 24.00000
female   21.000 26.00000
youngest 12.000  8.00000
young    13.000 17.00000
old      12.000 18.00000
oldest   13.000  7.00000
weight   37.461 40.38807

Upvotes: 1

Views: 2136

Answers (2)

Edward R. Mazurek
Edward R. Mazurek

Reputation: 2177

Using dplyr (albeit in a circuitous, hacky way!):

df %>%
    mutate(row_number1 = row_number(), row_number2 = row_number()) %>%
    spread(sex, row_number1) %>%
    spread(age, row_number2) %>%
    group_by(group) %>%
    mutate_each(funs(ifelse(is.na(.), 0, 1)), -weight) %>%
    mutate(count = 1) %>%
    summarize_each(funs(sum)) %>%
    mutate(weight = weight / (count)) %>%
    select(-count) %>%
    t()

result:

           [,1]     [,2]
group     1.000  2.00000
weight   37.461 40.38807
M        25.000 28.00000
F        25.000 22.00000
oldest   13.000  7.00000
old      12.000 18.00000
young    13.000 17.00000
youngest 12.000  8.00000

Upvotes: 3

user3603486
user3603486

Reputation:

I am assuming that for factors you want tables, and for numbers (e.g. weight) you want the mean.

This, not using dplyr, does what you want, though the result may not be formatted how you like.

sapply(df, function(x) if (is.factor(x)) table(x, df$group) else tapply(x, df$group, mean))

You might also want to look at the reporttools package, including tableNominal and tableContinuous.

Upvotes: 2

Related Questions