David Crow
David Crow

Reputation: 163

Why aren't my conditional ifelse recodes returning the expected values?

I'm creating a new variable that uses conditional "ifelse" statements based on the values of several source variables.

The source variables are a survey question that changed names over successive yearly editions of the survey. So when one of the variables has a value, the other variables, corresponding to other years of the survey, have missing ("NA") values.

The data are:

      A  B
[1,]  1 NA
[2,]  2 NA
[3,] NA NA
[4,] NA  1
[5,] NA  2

What I want to do is create "C" according to these rules:

 If A = 1 or B = 1, C = 1
 If A = 2 or B = 2, C = 2
 If A and B equal something other than 1 or 2, C = NA

This is what I get:

    A  B  C
1:  1 NA  1
2:  2 NA NA
3: NA NA NA
4: NA  1  1
5: NA  2 NA

So, in lines 2 and 5, C should be "2", not "NA".

Here's a reproducible example:

require(tidyverse)
require(dplyr)
require(data.table)

#Create Data
A <- as.numeric(c(1,2,NA,NA,NA))
B <- as.numeric(c(NA,NA,NA,1,2))
df_ <- cbind(A,B)

#Conditional Recode with dplyr
df2 <- df_ %>% data.frame() %>%
  mutate(C = ifelse(A == 1 | B == 1, 1, 
              ifelse(A == 2 | B == 2, 2, NA)))

#Conditional Recode with data.table()
df2 <- df_ %>% data.table() %>% 
  .[ , C := ifelse(A == 1 | B == 1, 1, 
            ifelse(A == 2 | B == 2, 2, NA))] 

I suspect it has something to do with the way R handles NAs.

Can anyone throw me a bone, here? Thanks! :)

Upvotes: 2

Views: 68

Answers (2)

jpsmith
jpsmith

Reputation: 17174

Use %in%, not == when NA's are in the mix:

df_ %>% data.frame() %>%
  mutate(C = ifelse(A %in% 1 | B %in% 1, 1, 
                    ifelse(A %in% 2 | B %in% 2, 2, NA)))

#    A  B  C
# 1  1 NA  1
# 2  2 NA  2
# 3 NA NA NA
# 4 NA  1  1
# 5 NA  2  2

The reason is that == returns NA whenever the element is NA because equality comparison is undefined, so returns NA - whereas %in% is testing each element in a set (though here the "set" only has one value). Since there is no direct match, it returns FALSE. You can see this with a simpler example here:

x <- c("A", "B", NA, "D")

x == "A"
# [1]  TRUE FALSE NA FALSE

x %in% "A"
# [1]  TRUE FALSE FALSE FALSE

Also, you may want to use dplyr::case_when instead of nested ifelse:

df_ %>% data.frame() %>%
  mutate(C = case_when(
    A %in% 1 | B %in% 1 ~ 1,
    A %in% 2 | B %in% 2 ~ 2
  ))

#    A  B  C
# 1  1 NA  1
# 2  2 NA  2
# 3 NA NA NA
# 4 NA  1  1
# 5 NA  2  2

Upvotes: 2

TarJae
TarJae

Reputation: 78907

Another apporach is using is.na():

library(dplyr)

df1 <- df %>%
  as.data.frame() %>% 
  mutate(C = ifelse(!is.na(A) & A == 1 | !is.na(B) & B == 1, 1,
                    ifelse(!is.na(A) & A == 2 | !is.na(B) & B == 2, 2, NA)))

df1

library(data.table)
df2 <- data.table(df1)
df2[, C := ifelse(!is.na(A) & A == 1 | !is.na(B) & B == 1, 1,
                  ifelse(!is.na(A) & A == 2 | !is.na(B) & B == 2, 2, NA))]
df2
 A  B  C
1  1 NA  1
2  2 NA  2
3 NA NA NA
4 NA  1  1
5 NA  2  2

 A  B  C
1:  1 NA  1
2:  2 NA  2
3: NA NA NA
4: NA  1  1
5: NA  2  2

Upvotes: 1

Related Questions