Reputation: 381
I have 2D numpy array something like this:
arr = np.array([[1,2,4],
[2,1,1],
[1,2,3]])
and a boolean array:
boolarr = np.array([[True, True, False],
[False, False, True],
[True, True,True]])
Now, when I try to slice arr based on boolarr, it gives me
arr[boolarr]
Output:
array([1, 2, 1, 1, 2, 3])
But I am looking to have a 2D array output instead. The desired output is
[[1, 2],
[1],
[1, 2, 3]]
Upvotes: 9
Views: 4142
Reputation: 88226
An option using numpy
is to start by adding up rows in the mask
:
take = boolarr.sum(axis=1)
#array([2, 1, 3])
Then mask the array as you do:
x = arr[boolarr]
#array([1, 2, 1, 1, 2, 3])
And use np.split
to split the flat array according to the np.cumsum
of take
(as the function expects the indices where to split the array):
np.split(x, np.cumsum(take)[:-1])
[array([1, 2]), array([1]), array([1, 2, 3])]
General solution
def mask_nd(x, m):
'''
Mask a 2D array and preserve the
dimension on the resulting array
----------
x: np.array
2D array on which to apply a mask
m: np.array
2D boolean mask
Returns
-------
List of arrays. Each array contains the
elements from the rows in x once masked.
If no elements in a row are selected the
corresponding array will be empty
'''
take = m.sum(axis=1)
return np.split(x[m], np.cumsum(take)[:-1])
Examples
Lets have a look at some examples:
arr = np.array([[1,2,4],
[2,1,1],
[1,2,3]])
boolarr = np.array([[True, True, False],
[False, False, False],
[True, True,True]])
mask_nd(arr, boolarr)
# [array([1, 2]), array([], dtype=int32), array([1, 2, 3])]
Or for the following arrays:
arr = np.array([[1,2],
[2,1]])
boolarr = np.array([[True, True],
[True, False]])
mask_nd(arr, boolarr)
# [array([1, 2]), array([2])]
Upvotes: 5
Reputation: 231335
In [183]: np.array([x[y] for x,y in zip(arr, boolarr)])
Out[183]: array([array([1, 2]), array([1]), array([1, 2, 3])], dtype=object)
should be competitive in speed. (It's a little faster if we omit the outer np.array
wrap, returning just a list of arrays.)
But realistic time tests are needed to be sure.
Upvotes: 0
Reputation: 114230
You may be looking for something as simple as a masked array. You can use the mask to create an array that masks out the desired values, so that they are not affected by further operations and don't affect the results of calculations:
marr = np.ma.array(arr, mask=~boolarr)
Notice that the mask must be flipped since it's the invalid elements that are masked. The result will look like
masked_array(data=[
[ 1 2 --]
[-- -- 1]
[ 1 2 3]],
mask=[
[False False True]
[ True True False]
[False False False]],
fill_value = 999999)
Upvotes: 1
Reputation: 164613
Your desired output is not a 2D array, since each "row" has a different number of "columns". A functional non-vectorised solution is possible via itertools.compress
:
from itertools import compress
res = list(map(list, map(compress, arr, boolarr)))
# [[1, 2], [1], [1, 2, 3]]
Upvotes: 3
Reputation: 2036
Here's one way to do it with list
instead:
[[arr[row][col] for col in range(3) if boolarr[row][col]] for row in range(3)]
# [[1,2], [1], [1,2,3]]
Upvotes: 0