DV Hughes
DV Hughes

Reputation: 305

Proportion of responses across choices and groups

I am fairly familiar with R but have reached a point where my data demands require me to learn iterative loops with multiple conditions. I have seen examples using various forms of *apply(), as well as colSums() and rowSums() used to perform the type(s) of data transformations that I need, but I want to increase the efficiency of these tasks, perhaps nesting or iterating a loop. Also, existing recommendations do not take into account data loss from ignoring/dropping "NA" items and I need to be able to retain this information.

my general data format is as follows:

group <- c("A", "B", "C", "A", "C" [...])

individual <- c("1", "2", "3", "4", "5" [...])

choice1 <- c("1", "0", "1", "1", "NA")

choice2 <- c("1", "NA", "1", "0", "NA")

[...]

choice10 <- c("1", "0", "1", "1", "NA")

I need to calculate a count of each of the three choices; 1==yes; 0==no; NA==opt-out of choice across choices and across groups, and then convert these in to percentages. Where I have encountered the most difficulty with previous methods like *apply() or Summation across row/column is that my "NA" values (opt outs) are ignored, or prevent me from being able to adequately take percentages of choice values across groups. Any specific advice or demonstration of how to either ignore OR retain the "opt outs"/NAs within the loop structure would be greatly appreciated.

The output would look a bit like the following: yes.count_bychoice

no.count_bychoice

optout.count_bychoice

percentyes_bychoice_bygroup

percentno_bychoice_bygroup

percentout_bychoice_bygroup

Upvotes: 0

Views: 265

Answers (2)

Ferdinand.kraft
Ferdinand.kraft

Reputation: 12819

First things first. Build a data.frame. like this:

d <- data.frame(group=group, individual=individual, choice1=choice1 ...)

I'll use as an example this:

d <- data.frame(group=sample(LETTERS[1:4],20,T), individual=1:20,
choice1=sample(c(0,1,NA),20,T), choice2=sample(c(0,1,NA),20,T))

I get

> head(d)
  group individual choice1 choice2
1     D          1       1      NA
2     A          2      NA      NA
3     C          3       1       1
4     A          4       1      NA
5     B          5       0      NA
6     B          6       1       1

We are going to use the following functions:

f <- function(x) c(yes=sum(x==1,na.rm=TRUE),no=sum(x==0,na.rm=TRUE),optout=sum(is.na(x)))

for counting and

g <- function(x) f(x)/length(x)

for percentages.

For the global counts you can use:

counts <- apply(d[,-(1:2)], 2, FUN=f)

Result:

> counts
       choice1 choice2
yes         11       8
no           4       2
optout       5      10

Changing the function you get the percentages:

> apply(d[,-(1:2)], 2, FUN=g)
       choice1 choice2
yes       0.55     0.4
no        0.20     0.1
optout    0.25     0.5

To get the counts per group per choice you can use this:

counts_grp <- aggregate(d[,-(1:2)], by=list(group=d$group), FUN=f)

Result:

> counts_grp
  group choice1.yes choice1.no choice1.optout choice2.yes choice2.no choice2.optout
1     A           1          0              3           2          0              2
2     B           3          2              0           3          1              1
3     C           4          0              2           3          0              3
4     D           3          2              0           0          1              4

For percentages you can simply switch the function:

> aggregate(d[,-(1:2)], by=list(group=d$group), FUN=g)
  group choice1.yes choice1.no choice1.optout choice2.yes choice2.no choice2.optout
1     A   0.2500000  0.0000000      0.7500000         0.5        0.0            0.5
2     B   0.6000000  0.4000000      0.0000000         0.6        0.2            0.2
3     C   0.6666667  0.0000000      0.3333333         0.5        0.0            0.5
4     D   0.6000000  0.4000000      0.0000000         0.0        0.2            0.8

Upvotes: 1

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 59980

For something quick and dirty you might want to try looking into aggregate and prop.table like this:

#Some data:
df <- data.frame( group = c("A", "B", "C", "A", "C" ) , 
individual = c("1", "2", "3", "4", "5" ),
choice1 = c("1", "0", "1", "1", "NA"),
choice2 = c("1", "NA", "1", "0", "NA") ,
choice3 = c("1", "NA", "NA", "0", "NA") )

#Convert to ordered factor to keep order of values as 0<1<NA in all cases, no matter the order they appear in a column
df <- as.data.frame( lapply( df , factor , order = TRUE ) )

#Then aggregate by group and choice, and work out proportion of each response
# Order of values is 0, then 1, then NA
# But if there are choices with missing values it won't be very good because it isn't labelled which values are which, but if all choices have at least one value in each category then first value will be proportion of 0, next will be proportion of 1's and finally proportion of NAs
aggregate( cbind( choice1 , choice2 , choice3 ) ~ group  , data = df , prop.table )

#group  choice1              choice2              choice3
#1     A 0.5, 0.5 0.6666667, 0.3333333 0.6666667, 0.3333333
#2     B        1                    1                    1
#3     C 0.4, 0.6             0.4, 0.6             0.5, 0.5

Upvotes: 0

Related Questions