Reputation: 1300
I have a data.table and an assigned column like this:
a <- c("CA","Canada",NA, "Canada", "CA")
b <- c("pass", NA,"pass","pass","pass")
dt <- data.table(a,b)
dt[,c := ifelse((a=="CA"|a=="Canada") & b=="pass", "pass","fail")]
My output looks like this:
a b c
CA pass pass
Canada <NA> <NA>
<NA> pass <NA>
Canada pass pass
CA pass pass
As I understand, where column a is "CA" and column b is "pass" or where column a is "Canada" and column b is "pass", a "pass" should be returned for column c and when the condition fails a "fail" should be passed in column c. However, I am getting NAs for all columns where an NA is anywhere in the column. Is there a way around this behavior without having to coerce all NAs to ""?
Upvotes: 1
Views: 448
Reputation: 160447
This is because you cannot use ==
on NA
and expect meaningful results, as @thelatemail said in his comment. However, one trick is that %in%
does allow easier comparison, so you can change to this:
dt[,c := ifelse(a %in% c("CA", "Canada") & b %in% "pass", "pass", "fail")]
# a b c
# 1: CA pass pass
# 2: Canada <NA> fail
# 3: <NA> pass fail
# 4: Canada pass pass
# 5: CA pass pass
Upvotes: 4