Gmichael
Gmichael

Reputation: 568

Getting frequency counts for multipe columns grouped BY another column

I am working on a questionnaire and the analysis will be based on the geographic region (a column in my data table).

In R, I am trying to figure out a way to get my entire questionnaire summarised by geographic region(KPG). So every geographic region as a row and each possible answer to a question (A001, A0002 etc.) as a column (including 0 values).

table(dummyframe$KPG, dummyframe$A001)
      1 2 3 4 5
  111 0 1 1 0 0
  112 1 1 0 0 0
  113 4 0 1 0 0
  114 0 3 1 1 0
  115 0 0 1 2 1
  116 1 0 0 0 0
xtabs(~KPG+A001,dummyframe)
 A001
KPG   1 2 3 4 5
  111 0 1 1 0 0
  112 1 1 0 0 0
  113 4 0 1 0 0
  114 0 3 1 1 0
  115 0 0 1 2 1
  116 1 0 0 0 0

both ways return the frequency count in the desired format and returns a table-format for question1

I expected to be able to do this for the many columns in my questionaire by adding like so:

table(dummyframe$KPG, df$A001+A002)

but this results in evaluating the region against question one AND then question 2 as evaluated against question 1, whereas I want question 1 by region and question 2 by region, but the questions not evaluated against each other.

I would like to apply the table function to each column of my data frame separately in one step and then bind the answers together so that my table is all answers by regions. I tried using aggregate

aggregate(.~KPG, dummyframe, count)
KPG    A001       A002       A003       A004
1 111    2, 3       4, 5       2, 3       1, 3
2 112    1, 2       3, 5       3, 4       1, 2
3 113    1, 3 1, 2, 3, 4    1, 3, 4    1, 2, 4
4 114 2, 3, 4 1, 2, 3, 4    1, 3, 4 0, 1, 2, 4
5 115 3, 4, 5    2, 4, 5 0, 2, 3, 4       0, 3
6 116       1          1          2          1
 A005
1    0, 4
2       4
3 0, 2, 3
4    1, 4
5 0, 1, 4
6       2

and this results in each grid cell being filled with c(1,3,5) values when answers 1, 3 and 5 were given and is, as you can assume, very unhelpful.

Any ideas for a loop? lapply? tapply?

UPDATE: added data

structure(list(KPG = c(111L, 111L, 112L, 112L, 113L, 113L, 113L, 
113L, 113L, 114L, 114L, 114L, 114L, 114L, 115L, 115L, 115L, 115L, 
116L), A001 = c(2L, 3L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 2L, 4L, 
2L, 3L, 3L, 4L, 5L, 4L, 1L), A002 = c(4L, 5L, 5L, 3L, 2L, 1L, 
3L, 4L, 2L, 3L, 2L, 4L, 4L, 1L, 4L, 5L, 5L, 2L, 1L), A003 = c(3L, 
2L, 3L, 4L, 3L, 4L, 1L, 4L, 4L, 4L, 1L, 3L, 3L, 4L, 2L, 4L, 0L, 
3L, 2L), A004 = c(1L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 4L, 4L, 2L, 
1L, NA, 0L, 3L, 0L, 3L, 0L, 1L), A005 = c(0L, 4L, 4L, 4L, 0L, 
0L, 3L, 3L, 2L, 1L, 1L, 4L, 1L, 4L, 4L, 0L, 1L, 1L, 2L)), .Names =      c("KPG", 
"A001", "A002", "A003", "A004", "A005"), row.names = c(NA, 19L
), class = "data.frame")

UPDATE: expected output

    A001      A002      A003      A004      A005
    1 2 3 4 5 1 2 3 4 5 0 1 2 3 4 0 1 2 3 4 0 1 2
111 0 1 1 0 0 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 0
112 1 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0
113 4 0 1 0 0 1 2 1 1 0 0 1 0 1 3 0 3 1 0 1 2 0 1
114 0 3 1 1 0 1 1 1 2 0 0 1 0 2 2 1 1 1 0 1 0 3 0
115 0 0 1 2 1 0 1 0 1 2 1 0 1 1 1 2 0 0 2 0 1 2 0
116 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1

do.call("cbind", lapply(names(dummyframe[-1]), function(x) { temp <- as.data.frame.matrix(table(dummyframe[["KPG"]], dummyframe[[x]])); setNames(temp, paste0(x, names(temp))) }))

--> as suggested gives you the expected output, but merges questions and answer numbers (which can be easily formatted in Excel)

Upvotes: 1

Views: 106

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389135

We can extend what you were doing for one column to multiple columns by using lapply and then cbind the results together

do.call("cbind", lapply(df[-1], function(x) table(df$KPG, x)))


#    1 2 3 4 5 1 2 3 4 5 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
#111 0 1 1 0 0 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 0 0 1
#112 1 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 2
#113 4 0 1 0 0 1 2 1 1 0 0 1 0 1 3 0 3 1 0 1 2 0 1 2 0
#114 0 3 1 1 0 1 1 1 2 0 0 1 0 2 2 1 1 1 0 1 0 3 0 0 2
#115 0 0 1 2 1 0 1 0 1 2 1 0 1 1 1 2 0 0 2 0 1 2 0 0 1
#116 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0

Upvotes: 1

Related Questions