Reputation: 500437

Comparing NumPy arrays so that NaNs compare equal

Is there an idiomatic way to compare two NumPy arrays that would treat NaNs as being equal to each other (but not equal to anything other than a NaN).

For example, I want the following two arrays to compare equal:

np.array([1.0, np.NAN, 2.0])
np.array([1.0, np.NAN, 2.0])

and the following two arrays to compare unequal:

np.array([1.0, np.NAN, 2.0])
np.array([1.0, 0.0, 2.0])

I am looking for a method that would produce a scalar Boolean outcome.

The following would do it:

np.all((a == b) | (np.isnan(a) & np.isnan(b)))

but it's clunky and creates all those intermediate arrays.

Is there a way that's easier on the eye and makes better use of memory?

P.S. If it helps, the arrays are known to have the same shape and dtype.

Upvotes: 24

Answers (4)

joris

Reputation: 139172

Numpy 1.10 added the equal_nan keyword to np.allclose (https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html).

So you can do now:

In [24]: np.allclose(np.array([1.0, np.NAN, 2.0]), 
                     np.array([1.0, np.NAN, 2.0]), equal_nan=True)
Out[24]: True

Upvotes: 10

sega_sai

Reputation: 8538

If you really care about memory use (e.g. have very large arrays), then you should use numexpr and the following expression will work for you:

np.all(numexpr.evaluate('(a==b)|((a!=a)&(b!=b))'))

I've tested it on very big arrays with length of 3e8, and the code has the same performance on my machine as

np.all(a==b)

and uses the same amount of memory

Upvotes: 18

DSM

Reputation: 353179

Disclaimer: I don't recommend this for regular use, and I wouldn't use it myself, but I could imagine rare circumstances under which it might be useful.

If the arrays have the same shape and dtype, you could consider using the low-level memoryview:

>>> import numpy as np
>>> 
>>> a0 = np.array([1.0, np.NAN, 2.0])
>>> ac = a0 * (1+0j)
>>> b0 = np.array([1.0, np.NAN, 2.0])
>>> b1 = np.array([1.0, np.NAN, 2.0, np.NAN])
>>> c0 = np.array([1.0, 0.0, 2.0])
>>> 
>>> memoryview(a0)
<memory at 0x85ba1bc>
>>> memoryview(a0) == memoryview(a0)
True
>>> memoryview(a0) == memoryview(ac) # equal but different dtype
False
>>> memoryview(a0) == memoryview(b0) # hooray!
True
>>> memoryview(a0) == memoryview(b1)
False
>>> memoryview(a0) == memoryview(c0)
False

But beware of subtle problems like this:

>>> zp = np.array([0.0])
>>> zm = -1*zp
>>> zp
array([ 0.])
>>> zm
array([-0.])
>>> zp == zm
array([ True], dtype=bool)
>>> memoryview(zp) == memoryview(zm)
False

which happens because the binary representations differ even though they compare equal (they have to, of course: that's how it knows to print the negative sign)

>>> memoryview(zp)[0]
'\x00\x00\x00\x00\x00\x00\x00\x00'
>>> memoryview(zm)[0]
'\x00\x00\x00\x00\x00\x00\x00\x80'

On the bright side, it short-circuits the way you might hope it would:

In [47]: a0 = np.arange(10**7)*1.0
In [48]: a0[-1] = np.NAN    
In [49]: b0 = np.arange(10**7)*1.0    
In [50]: b0[-1] = np.NAN     
In [51]: timeit memoryview(a0) == memoryview(b0)
10 loops, best of 3: 31.7 ms per loop
In [52]: c0 = np.arange(10**7)*1.0    
In [53]: c0[0] = np.NAN   
In [54]: d0 = np.arange(10**7)*1.0    
In [55]: d0[0] = 0.0    
In [56]: timeit memoryview(c0) == memoryview(d0)
100000 loops, best of 3: 2.51 us per loop

and for comparison:

In [57]: timeit np.all((a0 == b0) | (np.isnan(a0) & np.isnan(b0)))
1 loops, best of 3: 296 ms per loop
In [58]: timeit np.all((c0 == d0) | (np.isnan(c0) & np.isnan(d0)))
1 loops, best of 3: 284 ms per loop

Upvotes: 8

Ethan Coon

Reputation: 771

Not sure this is any better, but a thought...

import numpy
class FloatOrNaN(numpy.float_):
    def __eq__(self, other):
        return (numpy.isnan(self) and numpy.isnan(other)) or super(FloatOrNaN,self).__eq__(other)

a = [1., np.nan, 2.]
one = numpy.array([FloatOrNaN(val) for val in a], dtype=object)
two = numpy.array([FloatOrNaN(val) for val in a], dtype=object)
print one == two   # yields  array([ True,  True,  True], dtype=bool)

This pushes the ugliness into the dtype, at the expense of making the inner workings python instead of c (Cython/etc would fix this). It does, however, greatly reduce memory costs.

Still kinda ugly though :(

Upvotes: 0

Comparing NumPy arrays so that NaNs compare equal

Answers (4)

Related Questions