ericbrownaustin
ericbrownaustin

Reputation: 1300

data.table ifelse with multiple conditions not returning desired output

I have a data.table and an assigned column like this:

a <- c("CA","Canada",NA, "Canada", "CA")
b <- c("pass", NA,"pass","pass","pass")

dt <- data.table(a,b)

dt[,c := ifelse((a=="CA"|a=="Canada") & b=="pass", "pass","fail")]

My output looks like this:

    a    b    c
 CA      pass pass
 Canada  <NA> <NA>
 <NA>    pass <NA>
 Canada  pass pass
 CA      pass pass

As I understand, where column a is "CA" and column b is "pass" or where column a is "Canada" and column b is "pass", a "pass" should be returned for column c and when the condition fails a "fail" should be passed in column c. However, I am getting NAs for all columns where an NA is anywhere in the column. Is there a way around this behavior without having to coerce all NAs to ""?

Upvotes: 1

Views: 448

Answers (1)

r2evans
r2evans

Reputation: 160447

This is because you cannot use == on NA and expect meaningful results, as @thelatemail said in his comment. However, one trick is that %in% does allow easier comparison, so you can change to this:

dt[,c := ifelse(a %in% c("CA", "Canada") & b %in% "pass", "pass", "fail")]
#         a    b    c
# 1:     CA pass pass
# 2: Canada <NA> fail
# 3:   <NA> pass fail
# 4: Canada pass pass
# 5:     CA pass pass

Upvotes: 4

Related Questions