Reputation: 163
I'm creating a new variable that uses conditional "ifelse" statements based on the values of several source variables.
The source variables are a survey question that changed names over successive yearly editions of the survey. So when one of the variables has a value, the other variables, corresponding to other years of the survey, have missing ("NA") values.
The data are:
A B
[1,] 1 NA
[2,] 2 NA
[3,] NA NA
[4,] NA 1
[5,] NA 2
What I want to do is create "C" according to these rules:
If A = 1 or B = 1, C = 1
If A = 2 or B = 2, C = 2
If A and B equal something other than 1 or 2, C = NA
This is what I get:
A B C
1: 1 NA 1
2: 2 NA NA
3: NA NA NA
4: NA 1 1
5: NA 2 NA
So, in lines 2 and 5, C should be "2", not "NA".
Here's a reproducible example:
require(tidyverse)
require(dplyr)
require(data.table)
#Create Data
A <- as.numeric(c(1,2,NA,NA,NA))
B <- as.numeric(c(NA,NA,NA,1,2))
df_ <- cbind(A,B)
#Conditional Recode with dplyr
df2 <- df_ %>% data.frame() %>%
mutate(C = ifelse(A == 1 | B == 1, 1,
ifelse(A == 2 | B == 2, 2, NA)))
#Conditional Recode with data.table()
df2 <- df_ %>% data.table() %>%
.[ , C := ifelse(A == 1 | B == 1, 1,
ifelse(A == 2 | B == 2, 2, NA))]
I suspect it has something to do with the way R handles NAs.
Can anyone throw me a bone, here? Thanks! :)
Upvotes: 2
Views: 68
Reputation: 17174
Use %in%
, not ==
when NA
's are in the mix:
df_ %>% data.frame() %>%
mutate(C = ifelse(A %in% 1 | B %in% 1, 1,
ifelse(A %in% 2 | B %in% 2, 2, NA)))
# A B C
# 1 1 NA 1
# 2 2 NA 2
# 3 NA NA NA
# 4 NA 1 1
# 5 NA 2 2
The reason is that ==
returns NA
whenever the element is NA
because equality comparison is undefined, so returns NA
- whereas %in%
is testing each element in a set (though here the "set" only has one value). Since there is no direct match, it returns FALSE
. You can see this with a simpler example here:
x <- c("A", "B", NA, "D")
x == "A"
# [1] TRUE FALSE NA FALSE
x %in% "A"
# [1] TRUE FALSE FALSE FALSE
Also, you may want to use dplyr::case_when
instead of nested ifelse
:
df_ %>% data.frame() %>%
mutate(C = case_when(
A %in% 1 | B %in% 1 ~ 1,
A %in% 2 | B %in% 2 ~ 2
))
# A B C
# 1 1 NA 1
# 2 2 NA 2
# 3 NA NA NA
# 4 NA 1 1
# 5 NA 2 2
Upvotes: 2
Reputation: 78907
Another apporach is using is.na()
:
library(dplyr)
df1 <- df %>%
as.data.frame() %>%
mutate(C = ifelse(!is.na(A) & A == 1 | !is.na(B) & B == 1, 1,
ifelse(!is.na(A) & A == 2 | !is.na(B) & B == 2, 2, NA)))
df1
library(data.table)
df2 <- data.table(df1)
df2[, C := ifelse(!is.na(A) & A == 1 | !is.na(B) & B == 1, 1,
ifelse(!is.na(A) & A == 2 | !is.na(B) & B == 2, 2, NA))]
df2
A B C
1 1 NA 1
2 2 NA 2
3 NA NA NA
4 NA 1 1
5 NA 2 2
A B C
1: 1 NA 1
2: 2 NA 2
3: NA NA NA
4: NA 1 1
5: NA 2 2
Upvotes: 1