Reputation: 441
I am trying to summarize data across two variables, and the output with summarize is very chunky (at least in the r notebook output where the table breaks over multiple pages). I'd like to have one variable as the rows of the summary output, and the other as the columns, and then in the actual table the means for each combination of row & column data Some example data:
dat1 <- data.frame(
category = rep(c("catA", "catB", "catC"), each=4),
age = sample(1:2,size=4,replace=T),
value = rnorm(12)
)
and then I would usually get my summary dataframe like this:
dat1 %>% group_by(category,age)%>% summarize(mean(value))
but my actual data each of the variables have 10+ levels, so the table is very long and hard to read. I would prefer something like this, which I created using:
dat1 %>% group_by(category)
%>% summarize(mean.age1 =mean(value[age==1]),
mean.age2 =mean(value[age==2]))
There must be a better way than hand-coding means column?
Upvotes: 0
Views: 2855
Reputation: 10483
You just need to use tidyr
in addition to do something like this:
library(dplyr)
library(tidyr)
dat1 %>%
group_by(category, age) %>%
summarise(mean = mean(value)) %>%
spread(age, mean, sep = '')
Output is as follows:
Source: local data frame [3 x 3]
Groups: category [3]
category age1 age2
* <fctr> <dbl> <dbl>
1 catA 0.2930104 0.3861381
2 catB 0.5752186 0.1454201
3 catC 1.0845645 0.3117227
Upvotes: 2