Reputation: 568
I am working on a questionnaire and the analysis will be based on the geographic region (a column in my data table).
In R, I am trying to figure out a way to get my entire questionnaire summarised by geographic region(KPG). So every geographic region as a row and each possible answer to a question (A001, A0002 etc.) as a column (including 0 values).
table(dummyframe$KPG, dummyframe$A001)
1 2 3 4 5
111 0 1 1 0 0
112 1 1 0 0 0
113 4 0 1 0 0
114 0 3 1 1 0
115 0 0 1 2 1
116 1 0 0 0 0
xtabs(~KPG+A001,dummyframe)
A001
KPG 1 2 3 4 5
111 0 1 1 0 0
112 1 1 0 0 0
113 4 0 1 0 0
114 0 3 1 1 0
115 0 0 1 2 1
116 1 0 0 0 0
both ways return the frequency count in the desired format and returns a table-format for question1
I expected to be able to do this for the many columns in my questionaire by adding like so:
table(dummyframe$KPG, df$A001+A002)
but this results in evaluating the region against question one AND then question 2 as evaluated against question 1, whereas I want question 1 by region and question 2 by region, but the questions not evaluated against each other.
I would like to apply the table function to each column of my data frame separately in one step and then bind the answers together so that my table is all answers by regions. I tried using aggregate
aggregate(.~KPG, dummyframe, count)
KPG A001 A002 A003 A004
1 111 2, 3 4, 5 2, 3 1, 3
2 112 1, 2 3, 5 3, 4 1, 2
3 113 1, 3 1, 2, 3, 4 1, 3, 4 1, 2, 4
4 114 2, 3, 4 1, 2, 3, 4 1, 3, 4 0, 1, 2, 4
5 115 3, 4, 5 2, 4, 5 0, 2, 3, 4 0, 3
6 116 1 1 2 1
A005
1 0, 4
2 4
3 0, 2, 3
4 1, 4
5 0, 1, 4
6 2
and this results in each grid cell being filled with c(1,3,5) values when answers 1, 3 and 5 were given and is, as you can assume, very unhelpful.
Any ideas for a loop? lapply? tapply?
UPDATE: added data
structure(list(KPG = c(111L, 111L, 112L, 112L, 113L, 113L, 113L,
113L, 113L, 114L, 114L, 114L, 114L, 114L, 115L, 115L, 115L, 115L,
116L), A001 = c(2L, 3L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 2L, 4L,
2L, 3L, 3L, 4L, 5L, 4L, 1L), A002 = c(4L, 5L, 5L, 3L, 2L, 1L,
3L, 4L, 2L, 3L, 2L, 4L, 4L, 1L, 4L, 5L, 5L, 2L, 1L), A003 = c(3L,
2L, 3L, 4L, 3L, 4L, 1L, 4L, 4L, 4L, 1L, 3L, 3L, 4L, 2L, 4L, 0L,
3L, 2L), A004 = c(1L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 4L, 4L, 2L,
1L, NA, 0L, 3L, 0L, 3L, 0L, 1L), A005 = c(0L, 4L, 4L, 4L, 0L,
0L, 3L, 3L, 2L, 1L, 1L, 4L, 1L, 4L, 4L, 0L, 1L, 1L, 2L)), .Names = c("KPG",
"A001", "A002", "A003", "A004", "A005"), row.names = c(NA, 19L
), class = "data.frame")
UPDATE: expected output
A001 A002 A003 A004 A005
1 2 3 4 5 1 2 3 4 5 0 1 2 3 4 0 1 2 3 4 0 1 2
111 0 1 1 0 0 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 0
112 1 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0
113 4 0 1 0 0 1 2 1 1 0 0 1 0 1 3 0 3 1 0 1 2 0 1
114 0 3 1 1 0 1 1 1 2 0 0 1 0 2 2 1 1 1 0 1 0 3 0
115 0 0 1 2 1 0 1 0 1 2 1 0 1 1 1 2 0 0 2 0 1 2 0
116 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
do.call("cbind", lapply(names(dummyframe[-1]), function(x) { temp <- as.data.frame.matrix(table(dummyframe[["KPG"]], dummyframe[[x]])); setNames(temp, paste0(x, names(temp))) }))
--> as suggested gives you the expected output, but merges questions and answer numbers (which can be easily formatted in Excel)
Upvotes: 1
Views: 106
Reputation: 389135
We can extend what you were doing for one column to multiple columns by using lapply
and then cbind
the results together
do.call("cbind", lapply(df[-1], function(x) table(df$KPG, x)))
# 1 2 3 4 5 1 2 3 4 5 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
#111 0 1 1 0 0 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 0 0 1
#112 1 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 2
#113 4 0 1 0 0 1 2 1 1 0 0 1 0 1 3 0 3 1 0 1 2 0 1 2 0
#114 0 3 1 1 0 1 1 1 2 0 0 1 0 2 2 1 1 1 0 1 0 3 0 0 2
#115 0 0 1 2 1 0 1 0 1 2 1 0 1 1 1 2 0 0 2 0 1 2 0 0 1
#116 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0
Upvotes: 1