hellter
hellter

Reputation: 1004

How to treat NAs like values when comparing elementwise in R

I want to compare two vectors elementwise to check whether an element in a certain position in the first vector is different from the element in the same position in the second vector.
The point is that I have NA values inside the vectors, and when doing the comparison for these values I get NA instead of TRUE or FALSE.

Reproducible example:

Here is what I get:

a<-c(1, NA, 2, 2, NA)
b<-c(1, 1, 1, NA, NA)
a!=b
[1] FALSE   TRUE   NA   NA   NA  

Here is how I would like the != operator to work (treat NA values as if they were just another "level" of the variable):

a!=b
[1] FALSE   TRUE   TRUE   TRUE   FALSE

There's a possible solution at this link, but the guy is creating a function to perform the task. I was wondering if there's a more elegant way to do that.

Upvotes: 13

Views: 8358

Answers (5)

nagbalae
nagbalae

Reputation: 11

I'm not sure about it being the most elegant, but

paste(a) != paste(b)

(convert all elements of both vectors to strings)

Has the desired output, and is simpler, than most of the answers.

Upvotes: 1

Tomas
Tomas

Reputation: 59475

I like this one, since it is pretty simple and it's easy to see that it works (source):

# This function returns TRUE wherever elements are the same, including NA's,
# and FALSE everywhere else.
compareNA <- function(v1, v2) 
{
    same <- (v1 == v2) | (is.na(v1) & is.na(v2))
    same[is.na(same)] <- FALSE
    return(same)
}

Upvotes: 7

CJB
CJB

Reputation: 1809

Here is another solution. It's probably slower than my other answer because it's not vectorised, but it's certainly more elegant. I noticed the other day that %in% compares NA like other values. Thus c(1L, NA) %in% 1:4 gives TRUE FALSE rather than TRUE NA, for example.

So you can have:

!mapply(`%in%`, a, b)

Upvotes: 4

akrun
akrun

Reputation: 887088

We could perform an on-the-fly replacement of the NA values with a value v1 which is not present in both the vectors and do the !=

f1 <- function(x, y) {
  v1 <- setdiff(1:1000, na.omit(unique(c(x,y))))[1]
  replace(x, is.na(x), v1) != replace(y, is.na(y), v1)
}

f1(a,b)
#[1] FALSE  TRUE  TRUE  TRUE FALSE
f1(a1,b1)
#[1] TRUE TRUE TRUE
f1(a2,b2)
#[1] FALSE  TRUE  TRUE FALSE

data

a <- c(1, NA, 2, 2, NA)
b<-c(1, 1, 1, NA, NA)
a1 <- c(NA, 1, NA)
b1 <- c(2, NA, 3) 
a2<-c(1,NA,2,NA)
b2<-c(1,1,3,NA)

Upvotes: 1

CJB
CJB

Reputation: 1809

Taking advantage of the fact that:

T & NA = NA but F & NA = F

and

F | NA = NA but T | NA = T

The following solution works, with carefully placed brackets:

(a != b | (is.na(a) & !is.na(b)) | (is.na(b) & !is.na(a))) & !(is.na(a) & is.na(b))

You could define:

`%!=na%` <- function(e1, e2) (e1 != e2 | (is.na(e1) & !is.na(e2)) | (is.na(e2) & !is.na(e1))) & !(is.na(e1) & is.na(e2))

and then use:

a %!=na% b

Upvotes: 14

Related Questions