Reputation: 305
I am fairly familiar with R but have reached a point where my data demands require me to learn iterative loops with multiple conditions. I have seen examples using various forms of *apply(), as well as colSums() and rowSums() used to perform the type(s) of data transformations that I need, but I want to increase the efficiency of these tasks, perhaps nesting or iterating a loop. Also, existing recommendations do not take into account data loss from ignoring/dropping "NA" items and I need to be able to retain this information.
my general data format is as follows:
group <- c("A", "B", "C", "A", "C" [...])
individual <- c("1", "2", "3", "4", "5" [...])
choice1 <- c("1", "0", "1", "1", "NA")
choice2 <- c("1", "NA", "1", "0", "NA")
[...]
choice10 <- c("1", "0", "1", "1", "NA")
I need to calculate a count of each of the three choices; 1==yes; 0==no; NA==opt-out of choice across choices and across groups, and then convert these in to percentages. Where I have encountered the most difficulty with previous methods like *apply() or Summation across row/column is that my "NA" values (opt outs) are ignored, or prevent me from being able to adequately take percentages of choice values across groups. Any specific advice or demonstration of how to either ignore OR retain the "opt outs"/NAs within the loop structure would be greatly appreciated.
The output would look a bit like the following: yes.count_bychoice
no.count_bychoice
optout.count_bychoice
percentyes_bychoice_bygroup
percentno_bychoice_bygroup
percentout_bychoice_bygroup
Upvotes: 0
Views: 265
Reputation: 12819
First things first. Build a data.frame
. like this:
d <- data.frame(group=group, individual=individual, choice1=choice1 ...)
I'll use as an example this:
d <- data.frame(group=sample(LETTERS[1:4],20,T), individual=1:20,
choice1=sample(c(0,1,NA),20,T), choice2=sample(c(0,1,NA),20,T))
I get
> head(d)
group individual choice1 choice2
1 D 1 1 NA
2 A 2 NA NA
3 C 3 1 1
4 A 4 1 NA
5 B 5 0 NA
6 B 6 1 1
We are going to use the following functions:
f <- function(x) c(yes=sum(x==1,na.rm=TRUE),no=sum(x==0,na.rm=TRUE),optout=sum(is.na(x)))
for counting and
g <- function(x) f(x)/length(x)
for percentages.
For the global counts you can use:
counts <- apply(d[,-(1:2)], 2, FUN=f)
Result:
> counts
choice1 choice2
yes 11 8
no 4 2
optout 5 10
Changing the function you get the percentages:
> apply(d[,-(1:2)], 2, FUN=g)
choice1 choice2
yes 0.55 0.4
no 0.20 0.1
optout 0.25 0.5
To get the counts per group per choice you can use this:
counts_grp <- aggregate(d[,-(1:2)], by=list(group=d$group), FUN=f)
Result:
> counts_grp
group choice1.yes choice1.no choice1.optout choice2.yes choice2.no choice2.optout
1 A 1 0 3 2 0 2
2 B 3 2 0 3 1 1
3 C 4 0 2 3 0 3
4 D 3 2 0 0 1 4
For percentages you can simply switch the function:
> aggregate(d[,-(1:2)], by=list(group=d$group), FUN=g)
group choice1.yes choice1.no choice1.optout choice2.yes choice2.no choice2.optout
1 A 0.2500000 0.0000000 0.7500000 0.5 0.0 0.5
2 B 0.6000000 0.4000000 0.0000000 0.6 0.2 0.2
3 C 0.6666667 0.0000000 0.3333333 0.5 0.0 0.5
4 D 0.6000000 0.4000000 0.0000000 0.0 0.2 0.8
Upvotes: 1
Reputation: 59980
For something quick and dirty you might want to try looking into aggregate
and prop.table
like this:
#Some data:
df <- data.frame( group = c("A", "B", "C", "A", "C" ) ,
individual = c("1", "2", "3", "4", "5" ),
choice1 = c("1", "0", "1", "1", "NA"),
choice2 = c("1", "NA", "1", "0", "NA") ,
choice3 = c("1", "NA", "NA", "0", "NA") )
#Convert to ordered factor to keep order of values as 0<1<NA in all cases, no matter the order they appear in a column
df <- as.data.frame( lapply( df , factor , order = TRUE ) )
#Then aggregate by group and choice, and work out proportion of each response
# Order of values is 0, then 1, then NA
# But if there are choices with missing values it won't be very good because it isn't labelled which values are which, but if all choices have at least one value in each category then first value will be proportion of 0, next will be proportion of 1's and finally proportion of NAs
aggregate( cbind( choice1 , choice2 , choice3 ) ~ group , data = df , prop.table )
#group choice1 choice2 choice3
#1 A 0.5, 0.5 0.6666667, 0.3333333 0.6666667, 0.3333333
#2 B 1 1 1
#3 C 0.4, 0.6 0.4, 0.6 0.5, 0.5
Upvotes: 0