Creating pivot table for fully categorical dataset in R

Question

I have a table where all the values are categorical. It looks like this:

sample region question1 question2
1 reg1 yes yes
2 reg2 yes maybe
3 reg3 yes maybe
4 reg3 no yes

Is there an easy way to aggregate the data to count how many people answered yes and no to a question without writing loops?

My goals is get something like:

question
    yes no
reg1 15 20
reg2 30 11

etc.

I've checked reshape2 package but it seems that it does not do what I need.

Ben Bolker · Accepted Answer

The simplest answer seems to be table().

Make up data:

dd <- read.table(text="
sample region question1 question2
1 reg1 yes yes
2 reg2 yes maybe
3 reg3 yes maybe
4 reg3 no yes",
header=TRUE)

Analyze:

with(dd,table(region,question1))
##       question1
## region no yes
##   reg1  0   1
##   reg2  0   1
##   reg3  1   1

This works because question 1 has no "maybe" responses. If you want to leave them out, you could:

dd2 <- subset(dd,question1 %in% c("no","yes"))
with(dd2,table(...))

(with() is just a convenience; you could type table(dd$region,dd$question1) instead.)

For more complex questions you may well want to look into reshape2/plyr or tidyr/dplyr ...

For example, to summarize results for all questions simultaneously:

library("reshape2")
ddm <- melt(dd,id.vars=c("sample","region"))
tt <- with(ddm,table(region,value,variable))

This gives us a 3x3x2 table (a regionxvalue table for each question). If we want it in long format:

ttm <- melt(tt,value.name="count")
res <- dcast(ttm,region+variable~value,value.var="count")
##   region  variable maybe no yes
## 1   reg1 question1     0  0   1
## 2   reg1 question2     0  0   1
## 3   reg2 question1     0  0   1
## 4   reg2 question2     1  0   0
## 5   reg3 question1     0  1   1
## 6   reg3 question2     1  0   1

Creating pivot table for fully categorical dataset in R

Answers (2)

Related Questions