Luis
Luis

Reputation: 3497

How to compare two numpy arrays with some NaN values?

I need to compare some numpy arrays which should have the same elements in the same order, excepting for some NaN values in the second one.

I need a function more or less like this:

def func( array1, array2 ):
    if ???:
        return True
    else:
        return False

Example:

x = np.array( [ 1, 2, 3, 4, 5 ] )
y = np.array( [ 11, 2, 3, 4, 5 ] )
z = np.array( [ 1, 2, np.nan, 4, 5] )

func( x, z ) # returns True
func( y, z ) # returns False

The arrays have always the same length and the NaN values are always in the third one (x and y have always numbers only). I can imagine there is a function or something already, but I just don't find it.

Any ideas?

Upvotes: 5

Views: 3319

Answers (4)

ti7
ti7

Reputation: 18806

numpy.islcose() now provides an argument equal_nan for this case!

>>> import numpy as np
>>> np.isclose([1.0, np.nan], [1.0, np.nan])
array([ True, False])
>>> np.isclose([1.0, np.nan], [1.0, np.nan], equal_nan=True)
array([ True,  True])

docs https://numpy.org/doc/stable/reference/generated/numpy.isclose.html

Upvotes: 1

unutbu
unutbu

Reputation: 879729

You could use isclose to check for equality (or closeness to within a given tolerance -- this is particularly useful when comparing floats) and use isnan to check for NaNs in the second array. Combine the two with bitwise-or (|), and use all to demand every pair is either close or contains a NaN to obtain the desired result:

In [62]: np.isclose(x,z)
Out[62]: array([ True,  True, False,  True,  True], dtype=bool)

In [63]: np.isnan(z)
Out[63]: array([False, False,  True, False, False], dtype=bool)

So you could use:

def func(a, b):
    return (np.isclose(a, b) | np.isnan(b)).all()


In [67]: func(x, z)
Out[67]: True

In [68]: func(y, z)
Out[68]: False

Upvotes: 2

Eric
Eric

Reputation: 97601

You can use masked arrays, which have the behaviour you're asking for when combined with np.all:

zm = np.ma.masked_where(np.isnan(z), z)

np.all(x == zm) # returns True
np.all(y == zm) # returns False

Or you could just write out your logic explicitly, noting that numpy has to use | instead of or, and the difference in operator precedence that results:

def func(a, b):
    return np.all((a == b) | np.isnan(a) | np.isnan(b))

Upvotes: 6

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476709

What about:

from math import isnan

def fun(array1,array2):
    return all(isnan(x) or isnan(y) or x == y for x,y in zip(array1,array2))

This function works in both directions (if there are NaNs in the first list, these are also ignored). If you do not want that (which is a bit odd since equality usually works bidirectional). You can define:

from math import isnan

def fun(array1,array2):
    return all(isnan(y) or x == y for x,y in zip(array1,array2))

The code works as follows: we use zip to emit tuples of elements of both arrays. Next we check if either the element of the first list is NaN, or the second, or they are equal.

Given you want to write a really elegant function, you better also perform a length check:

from math import isnan

def fun(array1,array2):
    return len(array1) == len(array2) and all(isnan(y) or x == y for x,y in zip(array1,array2))

Upvotes: 1

Related Questions