kabrice
kabrice

Reputation: 1625

Boolean indexing on multidimensionnal array

I'm very new in Python and in Numpy. In fact, I'm just learning.

I'm reading this tutorial, and I got stuck in these lines :

>>> x = np.arange(30).reshape(2,3,5)
>>> x
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]],
       [[15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]])
>>> b = np.array([[True, True, False], [False, True, True]])
>>> x[b]
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

I can't understand how we have come up with the result of x[b].

I also try to guess the result of x[[False, False, False, True]]

Please explain to me, I'm a very newbie.

Upvotes: 2

Views: 2179

Answers (2)

Divakar
Divakar

Reputation: 221684

Under the hood, it computes the subscripted indices (indices along each dimensions) for the dimensions covered by the mask starting from the dimension it maps from, while selecting all elements from the un-indexed axes.

Case 1: 3D data and 2D mask

For example with b of two dimensions, it maps onto two dimensions and hence with x[b], it maps starting from the first axis onward.

The subscripted indices are computed with np.where/np.nonzero:

r,c = np.nonzero(b)

Thus, x[b] translates to x[r, c, :], or simply x[r, c]. So, then it uses advanced-indexing to select elements off each axis axis from the pairs of indexing tuples formed off r and c.

Case 2: 4D data and 2D mask

Now, let's increase the dimensionality of data array to 4D, keeping the same 2D mask, but index starting from the second axis onward, i.e. x[:, b].

Let's say we have

x = np.arange(60).reshape(2,2,3,5)

Get the subscripted indices and then use advanced-indexing:

r,c = np.nonzero(b)

So, x[:, b] should be same as x[:, r, c]:

In [148]: x = np.arange(60).reshape(2, 2, 3, 5)

In [149]: b = np.array([[True, True, False], [False, True, True]])

In [150]: r,c = np.nonzero(b)

In [151]: np.allclose(x[:, b], x[:, r, c])
Out[151]: True

Case 3: 4D data and 3D mask

To go deeper, let's consider a 3D mask array with 4D data array and use all of the theory proposed earlier for verification:

In [144]: x = np.arange(60).reshape(2, 2, 3, 5)
     ...: b = np.random.rand(2, 3, 5) > 0.5

In [146]: r, c, p = np.nonzero(b)

In [147]: np.allclose(x[:, b], x[:, r, c, p])
Out[147]: True

As for the edit, x[[False, False, False, True]], you are indexing only along the first axis with a boolean array of length 5, whereas the first axis of x has a length smaller than that, hence reports an error on indexing.

Upvotes: 4

Stanko
Stanko

Reputation: 4475

You have 3 arrays in 1 array:

[
 [ 0,  1,  2,  3,  4],
 [ 5,  6,  7,  8,  9],
 [10, 11, 12, 13, 14]
]  

With your following line: b = np.array([[True, True, False], ...]) you say that you want to keep the first 2 rows (the first 2 True values) and that you don't want the last row (the last False value).

The other part works the same way, you have 3 arrays in 1 array:

[
 [15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24],
 [25, 26, 27, 28, 29]
]

And your line b = np.array([..., [False, True, True]]) says to not keep the first row (because first value is False) but that you want to keep the two last lines (2 last values are True).

Upvotes: 2

Related Questions