DonQuiKong
DonQuiKong

Reputation: 411

Comparing numpy array of dtype object

My question is "why?:"

aa[0]
array([[405, 162, 414, 0,
        array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],
      dtype=object),
        0, 0, 0]], dtype=object)

aaa
array([[405, 162, 414, 0,
        array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],
      dtype=object),
        0, 0, 0]], dtype=object)

np.array_equal(aaa,aa[0])
False

Those arrays are completly identical.

My minimal example doesn't reproduce this:

be=np.array([1],dtype=object)

be
array([1], dtype=object)

ce=np.array([1],dtype=object)

ce
array([1], dtype=object)

np.array_equal(be,ce)
True

Nor does this one:

ce=np.array([np.array([1]),'5'],dtype=object)

be=np.array([np.array([1]),'5'],dtype=object)

np.array_equal(be,ce)
True

However, to reproduce my problem try this:

be=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

ce=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

np.array_equal(be,ce)
False

np.array_equal(be[0],ce[0])
False

And I have no idea why those are not equal. And to add the bonus question, how do I compare them?

I need an efficient way to check if aaa is in the stack aa.

I'm not using aaa in aa because of DeprecationWarning: elementwise == comparison failed; this will raise an error in the future. and because it still returns False if anyone is wondering.


What else have I tried?:

np.equal(be,ce)
*** ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

np.all(be,ce)
*** TypeError: only integer scalar arrays can be converted to a scalar index

all(be,ce)
*** TypeError: all() takes exactly one argument (2 given)

all(be==ce)
*** TypeError: 'bool' object is not iterable

np.where(be==ce)
(array([], dtype=int64),)

And these, which I can't get to run in the console, all evaluate to False, some giving the deprecation warning:

import numpy as np

ce=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

be=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

print(np.any([bee in ce for bee in be]))

print(np.any([bee==cee for bee in be for cee in ce]))

print(np.all([bee in ce for bee in be]))

print(np.all([bee==cee for bee in be for cee in ce]))

And of course other questions telling me this should work...

Upvotes: 8

Views: 5434

Answers (3)

Paul Panzer
Paul Panzer

Reputation: 53079

The behavior you are seeing is kind of documented here

Deprecations¶

...

Object array equality comparisons

In the future object array comparisons both == and np.equal will not make use of identity checks anymore. For example:

>

a = np.array([np.array([1, 2, 3]), 1])

b = np.array([np.array([1, 2, 3]), 1])

a == b

will consistently return False (and in the future an error) even if the array in a and b was the same object.

The equality operator == will in the future raise errors like np.equal if broadcasting or element comparisons, etc. fails.

Comparison with arr == None will in the future do an elementwise comparison instead of just returning False. Code should be using arr is None.

All of these changes will give Deprecation- or FutureWarnings at this time.

So far, so clear. Or is it?

We can see from @kmario23's answer that as of version 15.2 these changes are not fully implemented yet.

To make matters worse, consider this:

>>> A = np.array([None, a])
>>> A1 = np.array([None, a])
>>> At = np.array([None, a[:2]])
>>> 
>>> A==A1
False
>>> A==At
array([ True, False])
>>> 

Looks like the current behavior is more a coincidence than the result of careful planning.

I suspect it all comes down to whether an exception is raised during element-wise comparison, cf. here and here.

If two corresponding elements of the containing arrays are arrays themselves and of compatible shapes as in A==A1, their comparison yields an array of bools. Trying to cast this to a scalar bool raises an exception. Currently, exceptions are caught and a scalar False is returned.

In the A==At example an exception is raised when the last two elements are compared because their shapes don't broadcast. This is caught and the comparison for this element returns a scalar False which is why comparison of the containing arrays returns a "normal" array of bools.

What about the workarounds suggested by @kmario23 and @Kanak? Do they work?

Well, yes ...

>>> np.equal(A, A1, dtype=object)
array([True, array([ True,  True,  True])], dtype=object)
>>> wrpr(np.equal(A, A1, dtype=object))
True

... and no.

>>> AA = np.array([None, A])
>>> AA1 = np.array([None, A1])
>>> np.equal(AA, AA1, dtype=object)
array([True, False], dtype=object)
>>> wrpr(np.equal(AA, AA1, dtype=object))
False

Upvotes: 2

keepAlive
keepAlive

Reputation: 6665

To complement @kmario23's answer, what about doing

def wrpr(bools):
    try:
      # ints  = bools.flatten().prod()
        fltn_bools = np.hstack(bools)
    except: # should not pass silently.
        fltn_bools = np.array(wrpr(a) for a in bools)        
    ints = fltn_bools.prod()
    if isinstance(ints, np.ndarray):
        return wrpr(ints)
    return bool(ints)

And finally,

>>> wrpr(np.equal(ce, be, dtype=np.object))
True

Checked using (numpy1.15.1 & Python 3.6.5) & (numpy1.15.1 & Python 2.7.13).


But still, as commented here

NumPy is designed for rigid multidimensional grids of numbers. Trying to get anything but a rigid multidimensional grid is going to be painful. (@user2357112, Jul 31 '17 at 23:10)

and/or

Moral of the story: Don't use dtype=object arrays. They are stunted Python lists, with worse performance characteristics, and numpy is not designed to handle the case of sequence-like containers within these object arrays. (@juanpa.arrivillaga, Jul 31 '17 at 23:38)

Upvotes: 2

kmario23
kmario23

Reputation: 61415

To make an element-wise comparison between the arrays, you can use numpy.equal() with the keyword argument dtype=numpy.object as in :

In [60]: np.equal(be, ce, dtype=np.object)
Out[60]: 
array([[True, True, True, True,
        array([ True,  True,  True,  True,  True]), True, True, True]],
      dtype=object)

P.S. checked using NumPy version 1.15.2 and Python 3.6.6

edit

From the release notes for 1.15,

https://docs.scipy.org/doc/numpy-1.15.1/release.html#comparison-ufuncs-accept-dtype-object-overriding-the-default-bool

Comparison ufuncs accept dtype=object, overriding the default bool

This allows object arrays of symbolic types, which override == and 
other operators to return expressions, to be compared elementwise with 
np.equal(a, b, dtype=object).

Upvotes: 6

Related Questions