Reputation: 43
i am trying to generate a new variable as follows:
if value for testA is 1 and value for testB is 1 ==> code testAB as 1
if value for testA is 1 and value for testB is missing or 0 ==> code testAB as 1
if value for testA is missing or 0 and value for testB is 1 ==> code testAB as 1
if value for testA is 0 and value for testB is 0 ==> code testAB as 0
if value for testA is missing and value for testB is missing ==> code testAB as NA
the code i came up with shown below does not work. it seems only to generate a 1 if testA and testB are 1, and NA otherwise. what do you recommend? thank you!
df2$testAB<-ifelse((df1$testA == 1) | (df1$testB == 1),1,0),1, 0,NA))
Upvotes: 0
Views: 994
Reputation: 199
You need, minimally, n-1 ifelse() statements for n unique outcomes.
To simplify the problem, group your criteria for each outcome with or (|
).
In your case..
1
:
(df$testA == 1 & df$testB == 1) |
(df$testA == 1 & (is.na(df$testB) | df$testB == 0)) |
((is.na(df$testA) | df$testA == 0) & df$testB == 1)
0
: testA == 0 & testB == 0
NA
: is.na(testA) & is.na(testB)
With n-1 statements you don't have to write the most costly statement, so the logic for the following is: define all NA, then all 0, the rest is 1.
df <- expand.grid(testA =c(NA,0,1),testB = c(NA,0,1))
df$testAB = ifelse(is.na(df$testA) & is.na(df$testB),NA,
ifelse(df$testA == 0 & df$testB == 0, 0,1))
Outcome:
testA testB testAB
1 NA NA NA
2 0 NA NA
3 1 NA 1
4 NA 0 NA
5 0 0 0
6 1 0 1
7 NA 1 1
8 0 1 1
9 1 1 1
Tidyverse version:
library(tidyverse)
df <- expand.grid(testA =c(NA,0,1),testB = c(NA,0,1))
df <- df %>%
mutate(testAB = ifelse(is.na(testA) & is.na(testB),NA,
ifelse(testA == 0 & testB == 0, 0,1))
)
To test your own logic, you can make all arguments explicit:
df$testAB = ifelse(is.na(df$testA) & is.na(df$testB),NA,
ifelse(df$testA == 0 & df$testB == 0, 0,
ifelse((df$testA == 1 & df$testB == 1) |
(df$testA == 1 & (is.na(df$testB) | df$testB == 0)) |
((is.na(df$testA) | df$testA == 0) & df$testB == 1),1,
"error")))
Upvotes: 0
Reputation: 4357
This should get you what you're looking for
df1 <- data.frame(testA = c(1, 1, 1, 0, 0, 0, NA, NA, NA),
testB = c(0, 1, NA, 0, 1, NA, 0, 1, NA))
ind <- is.na(df1$testA) + is.na(df1$testB) < 2
df1$testAB[!ind] <- NA
df1$testAB[ind] <- as.numeric(as.logical(rowSums(df1[ind,], na.rm = TRUE)))
> df1
testA testB testAB
1 1 0 1
2 1 1 1
3 1 NA 1
4 0 0 0
5 0 1 1
6 0 NA 0
7 NA 0 0
8 NA 1 1
9 NA NA NA
Upvotes: 1