victoria
victoria

Reputation: 43

recode ifelse or statement

i am trying to generate a new variable as follows:

if value for testA is 1 and value for testB is 1 ==> code testAB as 1

if value for testA is 1 and value for testB is missing or 0 ==> code testAB as 1

if value for testA is missing or 0 and value for testB is 1 ==> code testAB as 1

if value for testA is 0 and value for testB is 0 ==> code testAB as 0

if value for testA is missing and value for testB is missing ==> code testAB as NA

the code i came up with shown below does not work. it seems only to generate a 1 if testA and testB are 1, and NA otherwise. what do you recommend? thank you!

df2$testAB<-ifelse((df1$testA == 1) | (df1$testB == 1),1,0),1, 0,NA))

Upvotes: 0

Views: 994

Answers (2)

Mark K
Mark K

Reputation: 199

You need, minimally, n-1 ifelse() statements for n unique outcomes.

To simplify the problem, group your criteria for each outcome with or (|).
In your case.. 1:

(df$testA == 1 & df$testB == 1) |  
(df$testA == 1 & (is.na(df$testB) | df$testB == 0)) |
((is.na(df$testA) | df$testA == 0) & df$testB == 1)  

0: testA == 0 & testB == 0

NA: is.na(testA) & is.na(testB)

With n-1 statements you don't have to write the most costly statement, so the logic for the following is: define all NA, then all 0, the rest is 1.

df <- expand.grid(testA =c(NA,0,1),testB = c(NA,0,1))

df$testAB = ifelse(is.na(df$testA) & is.na(df$testB),NA,
              ifelse(df$testA == 0 & df$testB == 0, 0,1)) 

Outcome:

  testA testB testAB
1    NA    NA     NA
2     0    NA     NA
3     1    NA      1
4    NA     0     NA
5     0     0      0
6     1     0      1
7    NA     1      1
8     0     1      1
9     1     1      1

Tidyverse version:

library(tidyverse)

df <- expand.grid(testA =c(NA,0,1),testB = c(NA,0,1))

df <- df %>% 
  mutate(testAB = ifelse(is.na(testA) & is.na(testB),NA,
                         ifelse(testA == 0 & testB == 0, 0,1))
        )

To test your own logic, you can make all arguments explicit:

df$testAB = ifelse(is.na(df$testA) & is.na(df$testB),NA,
              ifelse(df$testA == 0 & df$testB == 0, 0,
                     ifelse((df$testA == 1 & df$testB == 1) |
                            (df$testA == 1 & (is.na(df$testB) | df$testB == 0)) |
                            ((is.na(df$testA) | df$testA == 0) & df$testB == 1),1,
                            "error")))

Upvotes: 0

manotheshark
manotheshark

Reputation: 4357

This should get you what you're looking for

df1 <- data.frame(testA = c(1, 1, 1, 0, 0, 0, NA, NA, NA),
                  testB = c(0, 1, NA, 0, 1, NA, 0, 1, NA))

ind <- is.na(df1$testA) + is.na(df1$testB) < 2
df1$testAB[!ind] <- NA
df1$testAB[ind] <- as.numeric(as.logical(rowSums(df1[ind,], na.rm = TRUE)))

> df1
  testA testB testAB
1     1     0      1
2     1     1      1
3     1    NA      1
4     0     0      0
5     0     1      1
6     0    NA      0
7    NA     0      0
8    NA     1      1
9    NA    NA     NA

Upvotes: 1

Related Questions