Reputation: 3
What I'm trying to do is to create a single cataract variable from three different datasets that asked about cataract. (Basically, a phone interview, a wave using a short questionnaire, and a wave using a longer questionnaire.) These datasets have been merged, such that there are missing values created for the values for participants in the wave they didn't participate in. I've coded each of the three separate cataract vars as 1=YES and 0=NO.
In the following code, I'm trying to say if you respond yes (1) to any of the three vars, then give a value of 1, then if you are a No (0) to any give a value of 0, otherwise "NA".
survey$cataract<-ifelse(survey$ew3_cat==1 | survey$lq3_catnum==1 | survey$sq3_cat==1,1,
ifelse(survey$ew3_cat==0 | survey$lq3_catnum==0 | survey$sq3_cat==0,0,NA))
As you can see from the following result, I get the 1's, but everything else is "NA", no zeros.
> table(survey$cataract,useNA="ifany")
1 <NA>
10303 63322
Now, if I change the order, say do all the zeros first, then I get the correct 0's, but no 1's.
survey$cataract<-ifelse(survey$ew3_cat==0 | survey$lq3_catnum==0 | survey$sq3_cat==0,0,
ifelse(survey$ew3_cat==1 | survey$lq3_catnum==1 | survey$sq3_cat==1,1,NA))
> table(survey$cataract,useNA="ifany")
0 <NA>
63315 10310
The correct count from the three separate vars should be:
10,303 = 1
63,315 = 0
7= NA
I also tried replicating this problem with made-up data as follows:
x <- c(rep(1,100),rep(0,200),rep(NA,400))
y <- c(rep(NA,300),rep(1,100),rep(0,100),rep(NA,200))
z <- c(rep(NA,500),rep(1,100),rep(0,100))
cat <- ifelse(x==1|y==1|z==1,1,
ifelse(x==0|y==0|z==0,0,NA))
> table(cat,useNA="ifany")
cat
1 <NA>
300 400
Same problem if I reverse the order:
cat <- ifelse(x==0|y==0|z==0,0,
ifelse(x==1|y==1|z==1,1,NA))
> table(cat,useNA="ifany")
cat
0 <NA>
400 300
Any suggestions about what logical thing I'm missing here?
Upvotes: 0
Views: 388
Reputation: 44555
This is a little hackish but should give you the right result:
tmp <- as.numeric(mapply(any, as.logical(x),as.logical(y),as.logical(z), na.rm=TRUE))
tmp[which(mapply(all, is.na(x), is.na(y), is.na(z)))] <- NA
Basically it looks for any values of 1, returning 1 for those values and 0 otherwise. Then it goes back and puts NA
values back in wherever all of x
, y
, and z
are NA
.
> table(tmp)
tmp
0 1
400 300
Note: Your example data don't seem particularly good for testing this because you have cases that are NA-NA-NA
:
> ftable(x,y,z, useNA='always')
z 0 1 NA
x y
0 0 0 0 0
1 0 0 0
NA 0 0 200
1 0 0 0 0
1 0 0 0
NA 0 0 100
NA 0 0 0 100
1 0 0 100
NA 100 100 0
So, here's a slightly modified version of your data that shows the above code works correctly:
x <- c(rep(1,100),rep(0,200),rep(NA,400))
y <- c(rep(NA,300),rep(1,100),rep(0,100),rep(NA,200))
z <- c(rep(NA,500),rep(1,100),rep(0,50),rep(NA,50))
The result for those data:
> ftable(x,y,z, useNA='always')
z 0 1 NA
x y
0 0 0 0 0
1 0 0 0
NA 0 0 200
1 0 0 0 0
1 0 0 0
NA 0 0 100
NA 0 0 0 100
1 0 0 100
NA 50 100 50
> table(tmp, useNA='always')
tmp
0 1 <NA>
350 300 50
Upvotes: 0