Kevin
Kevin

Reputation: 2257

How to compare numpy array element one by one taken consideration the position of the element?

I want to compare two numpy array one element by one element taking consider of the position. For example

[1, 2, 3]==[1, 2, 3]  -> True

[1, 2, 3]==[2, 1, 3]  -> False

I tried the following

    for index in range(list1.shape[0]):
        if list1[index] != list2[index]:
            return False
    return True

But I got the following error

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

However the following is not the correct usage of .any or .all

 numpy.any(numpy.array([1,2,3]), numpy.array([1,2,3]))


 numpy.all(numpy.array([1,2,3]), numpy.array([1,2,3]))

As it returns

 TypeError: only length-1 arrays can be converted to Python scalars

I am very confused, can someone explain what I am doing wrong

Thanks

Upvotes: 2

Views: 876

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375435

You can also use array_equal:

In [11]: a = np.array([1, 2, 3])

In [12]: b = np.array([2, 1, 3])

In [13]: np.array_equal(a, a)
Out[13]: True

In [14]: np.array_equal(a, b)
Out[14]: False

This ought to be more efficient since you don't need to keep the temporary a==b...


Note: a little about performance, for larger arrays you want to be using np.all rather than all. array_equal performs about the same unless the arrays differ early, then it is much faster as it can fail early:

In [21]: a = np.arange(100000)

In [22]: b = np.arange(100000)

In [23]: c = np.arange(1, 100000)

In [24]: %timeit np.array_equal(a, a)  # Note: I expected this to check is first, it doesn't
10000 loops, best of 3: 183 µs per loop

In [25]: %timeit np.array_equal(a, b)
10000 loops, best of 3: 189 µs per loop

In [26]: %timeit np.array_equal(a, c)
100000 loops, best of 3: 5.9 µs per loop

In [27]: %timeit np.all(a == b)
10000 loops, best of 3: 184 µs per loop

In [28]: %timeit np.all(a == c)
10000 loops, best of 3: 40.7 µs per loop

In [29]: %timeit all(a == b)
100 loops, best of 3: 3.69 ms per loop

In [30]: %timeit all(a == c) # ahem!
# TypeError: 'bool' object is not iterable

Upvotes: 3

jonrsharpe
jonrsharpe

Reputation: 121966

You can pass an array of booleans to all, for example:

>>> import numpy as np
>>> a = np.array([1, 2, 3])
>>> b = np.array([2, 1, 3])
>>> a == b
array([False, False,  True], dtype=bool)
>>> np.all(a==b) # also works with all for 1D arrays
False

Note that the built-in all is much faster than np.all for small arrays (and np.array_equal is slower still):

>>> timeit.timeit("all(a==b)", setup="import numpy as np; a = np.array([1, 2, 3]); b = np.array([2, 1, 3])")
0.8798369040014222
>>> timeit.timeit("np.all(a==b)", setup="import numpy as np; a = np.array([1, 2, 3]); b = np.array([2, 1, 3])")
9.980971871998918
>>> timeit.timeit("np.array_equal(a, b)", setup="import numpy as np; a = np.array([1, 2, 3]); b = np.array([2, 1, 3])")
13.838635700998566

but will not work correctly with multidimensional arrays:

>>> a = np.arange(9).reshape(3, 3)
>>> b = a.copy()
>>> b[0, 0] = 42
>>> all(a==b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> np.all(a==b)
False

For larger arrays, np.all is fastest:

>>> timeit.timeit("np.all(a==b)", setup="import numpy as np; a = np.arange(1000); b = a.copy(); b[999] = 0")
13.581198551000853
>>> timeit.timeit("all(a==b)", setup="import numpy as np; a = np.arange(1000); b = a.copy(); b[999] = 0")
30.610838356002205
>>> timeit.timeit("np.array_equal(a, b)", setup="import numpy as np; a = np.arange(1000); b = a.copy(); b[999] = 0")
17.95089965599982

Upvotes: 2

Related Questions