iouvxz
iouvxz

Reputation: 163

Python indexing numpy array using a smaller boolean array

https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

If obj.ndim == x.ndim, x[obj] returns a 1-dimensional array filled with the elements of x corresponding to the True values of obj. The search order will be row-major, C-style. If obj has True values at entries that are outside of the bounds of x, then an index error will be raised. If obj is smaller than x it is identical to filling it with False.

I read from the numpy reference that I can index a larger array using a smaller boolean array ,and the rest entries would be automatically filled with False.

Example : From an array, select all rows which sum up to less or equal two:

>>> x = np.array([[0, 1], [1, 1], [2, 2]])
>>> rowsum = x.sum(-1)
>>> x[rowsum <= 2, :] 
array([[0, 1],[1, 1]])

But if rowsum would have two dimensions as well:

>>> rowsum = x.sum(-1, keepdims=True)
>>> rowsum.shape 
(3, 1)
>>> x[rowsum <= 2, :]    # fails 
IndexError: too many indices
>>> x[rowsum <= 2] 
array([0, 1])

The last one giving only the first elements because of the extra dimension.

But the example simply doesn't work ,it says "IndexError: boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1"

How to make it work ?I'm using python 3.6.3 and numpy 1.13.3. enter image description here

Upvotes: 3

Views: 1398

Answers (2)

B. M.
B. M.

Reputation: 18628

From Numpy 11, It's not compatible with the new default behaviour : (boolean-indexing-changes) :

Boolean indexing changes.

  • ...

  • ...

  • Boolean indexes must match the dimension of the axis that they index.

  • ...

Internals have been optimized, the docs not yet ....

Upvotes: 2

Anake
Anake

Reputation: 7649

I think what you are looking for is NumPy broadcasting.

import numpy as np

x = np.array([[0, 1], [1, 1], [2, 2]])
rowsum = x.sum(axis=1)
x[rowsum <= 2]

Gives:

array([[0, 1],
   [1, 1]])

The problem is that you used keepdims=True, which means the sum creates a column vector, rather than a rank one array which can be broadcasted.

Upvotes: 2

Related Questions