user3314465
user3314465

Reputation: 3

How can I combine values from three variables into one variable?

What I'm trying to do is to create a single cataract variable from three different datasets that asked about cataract. (Basically, a phone interview, a wave using a short questionnaire, and a wave using a longer questionnaire.) These datasets have been merged, such that there are missing values created for the values for participants in the wave they didn't participate in. I've coded each of the three separate cataract vars as 1=YES and 0=NO.

In the following code, I'm trying to say if you respond yes (1) to any of the three vars, then give a value of 1, then if you are a No (0) to any give a value of 0, otherwise "NA".

survey$cataract<-ifelse(survey$ew3_cat==1 | survey$lq3_catnum==1 | survey$sq3_cat==1,1,
                        ifelse(survey$ew3_cat==0 | survey$lq3_catnum==0 | survey$sq3_cat==0,0,NA))

As you can see from the following result, I get the 1's, but everything else is "NA", no zeros.

> table(survey$cataract,useNA="ifany")

    1  <NA> 
10303 63322 

Now, if I change the order, say do all the zeros first, then I get the correct 0's, but no 1's.

survey$cataract<-ifelse(survey$ew3_cat==0 | survey$lq3_catnum==0 | survey$sq3_cat==0,0,
                        ifelse(survey$ew3_cat==1 | survey$lq3_catnum==1 | survey$sq3_cat==1,1,NA))

> table(survey$cataract,useNA="ifany")

    0  <NA> 
63315 10310 

The correct count from the three separate vars should be:

10,303 = 1
63,315 = 0
7= NA

I also tried replicating this problem with made-up data as follows:

x <- c(rep(1,100),rep(0,200),rep(NA,400))
y <- c(rep(NA,300),rep(1,100),rep(0,100),rep(NA,200))
z <- c(rep(NA,500),rep(1,100),rep(0,100))

cat <- ifelse(x==1|y==1|z==1,1,
         ifelse(x==0|y==0|z==0,0,NA))
> table(cat,useNA="ifany")
cat
   1 <NA> 
 300  400 

Same problem if I reverse the order:

cat <- ifelse(x==0|y==0|z==0,0,
         ifelse(x==1|y==1|z==1,1,NA))
> table(cat,useNA="ifany")
cat
   0 <NA> 
 400  300

Any suggestions about what logical thing I'm missing here?

Upvotes: 0

Views: 388

Answers (1)

Thomas
Thomas

Reputation: 44555

This is a little hackish but should give you the right result:

tmp <- as.numeric(mapply(any, as.logical(x),as.logical(y),as.logical(z), na.rm=TRUE))
tmp[which(mapply(all, is.na(x), is.na(y), is.na(z)))] <- NA

Basically it looks for any values of 1, returning 1 for those values and 0 otherwise. Then it goes back and puts NA values back in wherever all of x, y, and z are NA.

> table(tmp)
tmp
  0   1 
400 300

Note: Your example data don't seem particularly good for testing this because you have cases that are NA-NA-NA:

> ftable(x,y,z, useNA='always')
      z   0   1  NA
x  y               
0  0      0   0   0
   1      0   0   0
   NA     0   0 200
1  0      0   0   0
   1      0   0   0
   NA     0   0 100
NA 0      0   0 100
   1      0   0 100
   NA   100 100   0

So, here's a slightly modified version of your data that shows the above code works correctly:

x <- c(rep(1,100),rep(0,200),rep(NA,400))
y <- c(rep(NA,300),rep(1,100),rep(0,100),rep(NA,200))
z <- c(rep(NA,500),rep(1,100),rep(0,50),rep(NA,50))

The result for those data:

> ftable(x,y,z, useNA='always')
      z   0   1  NA
x  y               
0  0      0   0   0
   1      0   0   0
   NA     0   0 200
1  0      0   0   0
   1      0   0   0
   NA     0   0 100
NA 0      0   0 100
   1      0   0 100
   NA    50 100  50

> table(tmp, useNA='always')
tmp
   0    1 <NA> 
 350  300   50 

Upvotes: 0

Related Questions