How can I combine values from three variables into one variable?

Question

What I'm trying to do is to create a single cataract variable from three different datasets that asked about cataract. (Basically, a phone interview, a wave using a short questionnaire, and a wave using a longer questionnaire.) These datasets have been merged, such that there are missing values created for the values for participants in the wave they didn't participate in. I've coded each of the three separate cataract vars as 1=YES and 0=NO.

In the following code, I'm trying to say if you respond yes (1) to any of the three vars, then give a value of 1, then if you are a No (0) to any give a value of 0, otherwise "NA".

survey$cataract<-ifelse(survey$ew3_cat==1 | survey$lq3_catnum==1 | survey$sq3_cat==1,1,
                        ifelse(survey$ew3_cat==0 | survey$lq3_catnum==0 | survey$sq3_cat==0,0,NA))

As you can see from the following result, I get the 1's, but everything else is "NA", no zeros.

> table(survey$cataract,useNA="ifany")

    1   
10303 63322

Now, if I change the order, say do all the zeros first, then I get the correct 0's, but no 1's.

survey$cataract<-ifelse(survey$ew3_cat==0 | survey$lq3_catnum==0 | survey$sq3_cat==0,0,
                        ifelse(survey$ew3_cat==1 | survey$lq3_catnum==1 | survey$sq3_cat==1,1,NA))

> table(survey$cataract,useNA="ifany")

    0   
63315 10310

The correct count from the three separate vars should be:

10,303 = 1
63,315 = 0
7= NA

I also tried replicating this problem with made-up data as follows:

x <- c(rep(1,100),rep(0,200),rep(NA,400))
y <- c(rep(NA,300),rep(1,100),rep(0,100),rep(NA,200))
z <- c(rep(NA,500),rep(1,100),rep(0,100))

cat <- ifelse(x==1|y==1|z==1,1,
         ifelse(x==0|y==0|z==0,0,NA))
> table(cat,useNA="ifany")
cat
   1  
 300  400

Same problem if I reverse the order:

cat <- ifelse(x==0|y==0|z==0,0,
         ifelse(x==1|y==1|z==1,1,NA))
> table(cat,useNA="ifany")
cat
   0  
 400  300

Any suggestions about what logical thing I'm missing here?

Thomas · Accepted Answer

This is a little hackish but should give you the right result:

tmp <- as.numeric(mapply(any, as.logical(x),as.logical(y),as.logical(z), na.rm=TRUE))
tmp[which(mapply(all, is.na(x), is.na(y), is.na(z)))] <- NA

Basically it looks for any values of 1, returning 1 for those values and 0 otherwise. Then it goes back and puts NA values back in wherever all of x, y, and z are NA.

> table(tmp)
tmp
  0   1 
400 300

Note: Your example data don't seem particularly good for testing this because you have cases that are NA-NA-NA:

> ftable(x,y,z, useNA='always')
      z   0   1  NA
x  y               
0  0      0   0   0
   1      0   0   0
   NA     0   0 200
1  0      0   0   0
   1      0   0   0
   NA     0   0 100
NA 0      0   0 100
   1      0   0 100
   NA   100 100   0

So, here's a slightly modified version of your data that shows the above code works correctly:

x <- c(rep(1,100),rep(0,200),rep(NA,400))
y <- c(rep(NA,300),rep(1,100),rep(0,100),rep(NA,200))
z <- c(rep(NA,500),rep(1,100),rep(0,50),rep(NA,50))

The result for those data:

> ftable(x,y,z, useNA='always')
      z   0   1  NA
x  y               
0  0      0   0   0
   1      0   0   0
   NA     0   0 200
1  0      0   0   0
   1      0   0   0
   NA     0   0 100
NA 0      0   0 100
   1      0   0 100
   NA    50 100  50

> table(tmp, useNA='always')
tmp
   0    1  
 350  300   50

How can I combine values from three variables into one variable?

Answers (1)

Related Questions