Esther
Esther

Reputation: 441

summarizing data in cross-table with grouped_by variable in columns

I am trying to summarize data across two variables, and the output with summarize is very chunky (at least in the r notebook output where the table breaks over multiple pages). I'd like to have one variable as the rows of the summary output, and the other as the columns, and then in the actual table the means for each combination of row & column data Some example data:

 dat1 <- data.frame(
    category = rep(c("catA", "catB", "catC"), each=4),
    age = sample(1:2,size=4,replace=T),
    value = rnorm(12)
 )

and then I would usually get my summary dataframe like this:

dat1 %>% group_by(category,age)%>% summarize(mean(value))

which looks like this: enter image description here

but my actual data each of the variables have 10+ levels, so the table is very long and hard to read. I would prefer something like this, which I created using:

dat1 %>% group_by(category)
%>% summarize(mean.age1 =mean(value[age==1]),
mean.age2 =mean(value[age==2]))

enter image description here

There must be a better way than hand-coding means column?

Upvotes: 0

Views: 2855

Answers (1)

Gopala
Gopala

Reputation: 10483

You just need to use tidyr in addition to do something like this:

library(dplyr)
library(tidyr)
dat1 %>%
  group_by(category, age) %>%
  summarise(mean = mean(value)) %>%
  spread(age, mean, sep = '')

Output is as follows:

Source: local data frame [3 x 3]
Groups: category [3]

  category      age1      age2
*   <fctr>     <dbl>     <dbl>
1     catA 0.2930104 0.3861381
2     catB 0.5752186 0.1454201
3     catC 1.0845645 0.3117227

Upvotes: 2

Related Questions