Reputation: 14751
I have a table whose header looks like this (I've simplified it):
id, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10
where each row, except for id, is a categorical variable. Let's name the categories A, B, C, D, E.
I would like to create a contingency table for some of the columns, such as below (for brevity, I have not put sample numbers in the cells). Getting the total column/row would be great, but not mandatory, I can calculate it myself later.
a1 a2 a3 a4 Total
----------------------
A|
B|
C|
D|
E|
Total|
Thus, the question is how to create a crosstab based on multiple columns in R? The examples I've seen with table() and xtabs() use a column only. In my case, the columns are adjacent, so one crosstab would summarize columns a1..a4, another a5..a7 and so on. I hope there is an elegant way to do this.
I'm a programmer, but a newbie in R.
Thank you in advance.
Upvotes: 2
Views: 11806
Reputation: 176718
Here's how to do it using base R commands. You don't need the for
loop if every column has the same factor levels, but the loop would be a good fail-safe.
> set.seed(21)
> df <- data.frame(
+ id=1:20,
+ a1=sample(letters[1:4],20,TRUE),
+ a2=sample(letters[1:5],20,TRUE),
+ a3=sample(letters[2:5],20,TRUE),
+ a4=sample(letters[1:5],20,TRUE),
+ a5=sample(letters[1:5],20,TRUE),
+ a6=sample(letters[1:5],20,TRUE) )
>
> for(i in 2:NCOL(df)) {
+ levels(df[,i]) <- list(a="a",b="b",c="c",d="d",e="e")
+ }
>
> addmargins(mapply(table,df[,-1]))
a1 a2 a3 a4 a5 a6 Sum
a 6 2 0 2 5 3 18
b 3 3 7 2 1 3 19
c 5 3 1 6 5 3 23
d 6 8 6 1 5 3 29
e 0 4 6 9 4 8 31
Sum 20 20 20 20 20 20 120
Upvotes: 3
Reputation: 9047
Your data is poorly formatted for this purpose. Here's one approach to appropriately reshaping the data with the reshape
package.
library(reshape)
data.m <- melt(data, id = "id")
To compute a table for all levels, with margins, you could use
cast(data.m, value ~ variable, margins = T)
For a subset, take the relevant subset of data.m
.
Upvotes: 7