Reputation: 6652
I want to apply boolean masking both to rows and columns.
With
X = np.array([[1,2,3],[4,5,6]])
mask1 = np.array([True, True])
mask2 = np.array([True, True, False])
X[mask1, mask2]
I expect the output to be
array([[1,2],[4,5]])
instead of
array([1,5])
It's known that
X[:, mask2]
can be used here but that's not a solution for the general case.
I would like to know how it works under the hood and why in this case the result is array([1,5])
.
Upvotes: 8
Views: 5676
Reputation: 173
In a more general sense, your question is bout finding the subpart of an array containing certain rows and columns.
main_array = np.array([[1,2,3],[4,5,6]])
mask_ax_0 = np.array([True, True]) # about which rows, i want
mask_ax_1 = np.array([True, True, False]) # which columns, i want
Answer:
mask_2d = np.logical_and(mask_ax_0.reshape(-1,1), mask_ax_1.reshape(1,-1))
sub_array = main_array[mask_2d].reshape(np.sum(mask_ax_0), np.sum(mask_ax_1))
print(sub_array)
Upvotes: 0
Reputation: 231385
X[mask1, mask2]
is described in Boolean Array Indexing Doc as the equivalent of
In [249]: X[mask1.nonzero()[0], mask2.nonzero()[0]]
Out[249]: array([1, 5])
In [250]: X[[0,1], [0,1]]
Out[250]: array([1, 5])
In effect it is giving you X[0,0]
and X[1,1]
(pairing the 0s and 1s).
What you want instead is:
In [251]: X[[[0],[1]], [0,1]]
Out[251]:
array([[1, 2],
[4, 5]])
np.ix_
is a handy tool for creating the right mix of dimensions
In [258]: np.ix_([0,1],[0,1])
Out[258]:
(array([[0],
[1]]), array([[0, 1]]))
In [259]: X[np.ix_([0,1],[0,1])]
Out[259]:
array([[1, 2],
[4, 5]])
That's effectively a column vector for the 1st axis and row vector for the second, together defining the desired rectangle of values.
But trying to broadcast boolean arrays like this does not work: X[mask1[:,None], mask2]
But that reference section says:
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.
In [260]: X[np.ix_(mask1, mask2)]
Out[260]:
array([[1, 2],
[4, 5]])
In [261]: np.ix_(mask1, mask2)
Out[261]:
(array([[0],
[1]], dtype=int32), array([[0, 1]], dtype=int32))
The boolean section of ix_
:
if issubdtype(new.dtype, _nx.bool_):
new, = new.nonzero()
So it works with a mix like X[np.ix_(mask1, [0,2])]
Upvotes: 6
Reputation: 152677
One solution would be to use sequential integer indexing and getting the integers for example from np.where
:
>>> X[:, np.where(mask1)[0]][np.where(mask2)[0]]
array([[1, 2],
[4, 5]])
or as @user2357112 pointed out in the comments np.ix_
could be used as well. For example:
>>> X[np.ix_(np.where(mask1)[0], np.where(mask2)[0])]
array([[1, 2],
[4, 5]])
Another idea would be to broadcast your masks and then do it in one step would require a reshape afterwards:
>>> X[np.where(mask1[:, None] * mask2)]
array([1, 2, 4, 5])
>>> X[np.where(mask1[:, None] * mask2)].reshape(2, 2)
array([[1, 2],
[4, 5]])
Upvotes: 2
Reputation: 42
You should be using the numpy.ma
module.
In particular, you could use mask_rowcols
:
X = np.array([[1,2,3],[4,5,6]])
linesmask = np.array([True, True])
colsmask = np.array([True, True, False])
X = X.view(ma.MaskedArray)
for i in range(len(linesmask)):
X.mask[i][0] = not linemask[i]
for j in range(len(colsmask)):
X.mask[0][j] = not colsmask[j]
X = ma.mask_rowcols(X)
Upvotes: -2