Simd
Simd

Reputation: 21264

How to test if all rows are equal in a numpy

In numpy, is there a nice idiomatic way of testing if all rows are equal in a 2d array?

I can do something like

np.all([np.array_equal(M[0], M[i]) for i in xrange(1,len(M))])

This seems to mix python lists with numpy arrays which is ugly and presumably also slow.

Is there a nicer/neater way?

Upvotes: 27

Views: 19219

Answers (4)

Roy
Roy

Reputation: 53

For Alex's answer about nan, we have now,

np.isclose([1.0, np.nan], [1.0, np.nan], equal_nan=True)
np.allclose([1.0, np.nan], [1.0, np.nan], equal_nan=True)

Upvotes: 1

Alex Riley
Alex Riley

Reputation: 176750

One way is to check that every row of the array arr is equal to its first row arr[0]:

(arr == arr[0]).all()

Using equality == is fine for integer values, but if arr contains floating point values you could use np.isclose instead to check for equality within a given tolerance:

np.isclose(a, a[0]).all()

If your array contains NaN and you want to avoid the tricky NaN != NaN issue, you could combine this approach with np.isnan:

(np.isclose(a, a[0]) | np.isnan(a)).all()

Upvotes: 33

lucidyan
lucidyan

Reputation: 3893

It is worth mentioning that the above version will not work for multidimensional arrays.

For example: for a three-dimensional square image tensor img [256, 256, 3] , we need to check whether the same RGB [256, 256] layers in the image or not. In this case, we need to use broadcasting

(img == img[:, :, 0, np.newaxis]).all()

Because simple img[:, :, 0] gives us [256, 256], but we need [256, 256, 1] to broadcast through layers.

Upvotes: 6

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250901

Simply check if the number if unique items in the array are 1:

>>> arr = np.array([[1]*10 for _ in xrange(5)])
>>> len(np.unique(arr)) == 1
True

A solution inspired from unutbu's answer:

>>> arr = np.array([[1]*10 for _ in xrange(5)])
>>> np.all(np.all(arr == arr[0,:], axis = 1))
True

One problem with your code is that you're creating an entire list first before applying np.all() on it. Due to that there's no short-circuiting happening in your version, instead of that it would be better if you use Python's all() with a generator expression:

Timing comparisons:

>>> M = arr = np.array([[3]*100] + [[2]*100 for _ in xrange(1000)])
>>> %timeit np.all(np.all(arr == arr[0,:], axis = 1))
1000 loops, best of 3: 272 µs per loop
>>> %timeit (np.diff(M, axis=0) == 0).all()
1000 loops, best of 3: 596 µs per loop
>>> %timeit np.all([np.array_equal(M[0], M[i]) for i in xrange(1,len(M))])
100 loops, best of 3: 10.6 ms per loop
>>> %timeit all(np.array_equal(M[0], M[i]) for i in xrange(1,len(M)))
100000 loops, best of 3: 11.3 µs per loop

>>> M = arr = np.array([[2]*100 for _ in xrange(1000)])
>>> %timeit np.all(np.all(arr == arr[0,:], axis = 1))
1000 loops, best of 3: 330 µs per loop
>>> %timeit (np.diff(M, axis=0) == 0).all()
1000 loops, best of 3: 594 µs per loop
>>> %timeit np.all([np.array_equal(M[0], M[i]) for i in xrange(1,len(M))])
100 loops, best of 3: 9.51 ms per loop
>>> %timeit all(np.array_equal(M[0], M[i]) for i in xrange(1,len(M)))
100 loops, best of 3: 9.44 ms per loop

Upvotes: 5

Related Questions