Reputation: 6020
I have a dataframe I want to summarise, using dplyr. In the dataframe there are multiple factors and I want to report the counts of each factor level summarized per group.
Is there a way to do the following using dplyr without having to name each factor level in the summarise statement.
library(dplyr)
set.seed(123)
s <- rbinom(100,1,0.5)
s <- factor(s,0:1,c('M','F'))
a <- sample(1:4,100,TRUE)
a <- factor(a,1:4,c('oldest','old','young','youngest'))
w <- rnorm(100,40,10)
g <- rep(1:2,each=50)
df <- data.frame(sex=s, age=a, weight=w, group=g)
sm <- df %>% group_by(group) %>% summarise(
male = sum(ifelse(sex=='M',1,0))
,female = sum(ifelse(sex=='F',1,0))
,youngest = sum(ifelse(age=='youngest',1,0))
,young = sum(ifelse(age=='young',1,0))
,old = sum(ifelse(age=='old',1,0))
,oldest = sum(ifelse(age=='oldest',1,0))
,weight = mean(weight)
)
print(t(sm))
result:
[,1] [,2]
group 1.000 2.00000
male 29.000 24.00000
female 21.000 26.00000
youngest 12.000 8.00000
young 13.000 17.00000
old 12.000 18.00000
oldest 13.000 7.00000
weight 37.461 40.38807
Upvotes: 1
Views: 2136
Reputation: 2177
Using dplyr (albeit in a circuitous, hacky way!):
df %>%
mutate(row_number1 = row_number(), row_number2 = row_number()) %>%
spread(sex, row_number1) %>%
spread(age, row_number2) %>%
group_by(group) %>%
mutate_each(funs(ifelse(is.na(.), 0, 1)), -weight) %>%
mutate(count = 1) %>%
summarize_each(funs(sum)) %>%
mutate(weight = weight / (count)) %>%
select(-count) %>%
t()
result:
[,1] [,2]
group 1.000 2.00000
weight 37.461 40.38807
M 25.000 28.00000
F 25.000 22.00000
oldest 13.000 7.00000
old 12.000 18.00000
young 13.000 17.00000
youngest 12.000 8.00000
Upvotes: 3
Reputation:
I am assuming that for factors you want tables, and for numbers (e.g. weight
) you want the mean.
This, not using dplyr, does what you want, though the result may not be formatted how you like.
sapply(df, function(x) if (is.factor(x)) table(x, df$group) else tapply(x, df$group, mean))
You might also want to look at the reporttools
package, including tableNominal
and tableContinuous
.
Upvotes: 2