Proportion of responses across choices and groups

Question

I am fairly familiar with R but have reached a point where my data demands require me to learn iterative loops with multiple conditions. I have seen examples using various forms of *apply(), as well as colSums() and rowSums() used to perform the type(s) of data transformations that I need, but I want to increase the efficiency of these tasks, perhaps nesting or iterating a loop. Also, existing recommendations do not take into account data loss from ignoring/dropping "NA" items and I need to be able to retain this information.

my general data format is as follows:

group <- c("A", "B", "C", "A", "C" [...])

individual <- c("1", "2", "3", "4", "5" [...])

choice1 <- c("1", "0", "1", "1", "NA")

choice2 <- c("1", "NA", "1", "0", "NA")

[...]

choice10 <- c("1", "0", "1", "1", "NA")

I need to calculate a count of each of the three choices; 1==yes; 0==no; NA==opt-out of choice across choices and across groups, and then convert these in to percentages. Where I have encountered the most difficulty with previous methods like *apply() or Summation across row/column is that my "NA" values (opt outs) are ignored, or prevent me from being able to adequately take percentages of choice values across groups. Any specific advice or demonstration of how to either ignore OR retain the "opt outs"/NAs within the loop structure would be greatly appreciated.

The output would look a bit like the following: yes.count_bychoice

no.count_bychoice

optout.count_bychoice

percentyes_bychoice_bygroup

percentno_bychoice_bygroup

percentout_bychoice_bygroup

Ferdinand.kraft · Accepted Answer

First things first. Build a data.frame. like this:

d <- data.frame(group=group, individual=individual, choice1=choice1 ...)

I'll use as an example this:

d <- data.frame(group=sample(LETTERS[1:4],20,T), individual=1:20,
choice1=sample(c(0,1,NA),20,T), choice2=sample(c(0,1,NA),20,T))

I get

> head(d)
  group individual choice1 choice2
1     D          1       1      NA
2     A          2      NA      NA
3     C          3       1       1
4     A          4       1      NA
5     B          5       0      NA
6     B          6       1       1

We are going to use the following functions:

f <- function(x) c(yes=sum(x==1,na.rm=TRUE),no=sum(x==0,na.rm=TRUE),optout=sum(is.na(x)))

for counting and

g <- function(x) f(x)/length(x)

for percentages.

For the global counts you can use:

counts <- apply(d[,-(1:2)], 2, FUN=f)

Result:

> counts
       choice1 choice2
yes         11       8
no           4       2
optout       5      10

Changing the function you get the percentages:

> apply(d[,-(1:2)], 2, FUN=g)
       choice1 choice2
yes       0.55     0.4
no        0.20     0.1
optout    0.25     0.5

To get the counts per group per choice you can use this:

counts_grp <- aggregate(d[,-(1:2)], by=list(group=d$group), FUN=f)

Result:

> counts_grp
  group choice1.yes choice1.no choice1.optout choice2.yes choice2.no choice2.optout
1     A           1          0              3           2          0              2
2     B           3          2              0           3          1              1
3     C           4          0              2           3          0              3
4     D           3          2              0           0          1              4

For percentages you can simply switch the function:

> aggregate(d[,-(1:2)], by=list(group=d$group), FUN=g)
  group choice1.yes choice1.no choice1.optout choice2.yes choice2.no choice2.optout
1     A   0.2500000  0.0000000      0.7500000         0.5        0.0            0.5
2     B   0.6000000  0.4000000      0.0000000         0.6        0.2            0.2
3     C   0.6666667  0.0000000      0.3333333         0.5        0.0            0.5
4     D   0.6000000  0.4000000      0.0000000         0.0        0.2            0.8

Proportion of responses across choices and groups

Answers (2)

Related Questions