user3788040
user3788040

Reputation: 381

Mask 2D array preserving shape

I have 2D numpy array something like this:

arr = np.array([[1,2,4],
                [2,1,1],
                [1,2,3]])

and a boolean array:

boolarr = np.array([[True, True, False],
                    [False, False, True],
                    [True, True,True]])

Now, when I try to slice arr based on boolarr, it gives me

arr[boolarr]

Output:

array([1, 2, 1, 1, 2, 3])

But I am looking to have a 2D array output instead. The desired output is

[[1, 2],
 [1],
 [1, 2, 3]]

Upvotes: 9

Views: 4142

Answers (5)

yatu
yatu

Reputation: 88226

An option using numpy is to start by adding up rows in the mask:

take = boolarr.sum(axis=1)
#array([2, 1, 3])

Then mask the array as you do:

x = arr[boolarr]
#array([1, 2, 1, 1, 2, 3])

And use np.split to split the flat array according to the np.cumsum of take (as the function expects the indices where to split the array):

np.split(x, np.cumsum(take)[:-1])
[array([1, 2]), array([1]), array([1, 2, 3])]

General solution

def mask_nd(x, m):
    '''
    Mask a 2D array and preserve the
    dimension on the resulting array
    ----------
    x: np.array
       2D array on which to apply a mask
    m: np.array
        2D boolean mask  
    Returns
    -------
    List of arrays. Each array contains the
    elements from the rows in x once masked.
    If no elements in a row are selected the 
    corresponding array will be empty
    '''
    take = m.sum(axis=1)
    return np.split(x[m], np.cumsum(take)[:-1])

Examples

Lets have a look at some examples:

arr = np.array([[1,2,4],
                [2,1,1],
                [1,2,3]])

boolarr = np.array([[True, True, False],
                    [False, False, False],
                    [True, True,True]])

mask_nd(arr, boolarr)
# [array([1, 2]), array([], dtype=int32), array([1, 2, 3])]

Or for the following arrays:

arr = np.array([[1,2],
                [2,1]])

boolarr = np.array([[True, True],
                    [True, False]])

mask_nd(arr, boolarr)
# [array([1, 2]), array([2])]

Upvotes: 5

hpaulj
hpaulj

Reputation: 231335

In [183]: np.array([x[y] for x,y in zip(arr, boolarr)])
Out[183]: array([array([1, 2]), array([1]), array([1, 2, 3])], dtype=object)

should be competitive in speed. (It's a little faster if we omit the outer np.array wrap, returning just a list of arrays.)

But realistic time tests are needed to be sure.

Upvotes: 0

Mad Physicist
Mad Physicist

Reputation: 114230

You may be looking for something as simple as a masked array. You can use the mask to create an array that masks out the desired values, so that they are not affected by further operations and don't affect the results of calculations:

marr = np.ma.array(arr, mask=~boolarr)

Notice that the mask must be flipped since it's the invalid elements that are masked. The result will look like

masked_array(data=[
        [ 1  2 --]
        [-- --  1]
        [ 1  2  3]],
    mask=[
        [False False  True]
        [ True  True False]
        [False False False]],
    fill_value = 999999)

Upvotes: 1

jpp
jpp

Reputation: 164613

Your desired output is not a 2D array, since each "row" has a different number of "columns". A functional non-vectorised solution is possible via itertools.compress:

from itertools import compress

res = list(map(list, map(compress, arr, boolarr)))

# [[1, 2], [1], [1, 2, 3]]

Upvotes: 3

Stephen C
Stephen C

Reputation: 2036

Here's one way to do it with list instead:

[[arr[row][col] for col in range(3) if boolarr[row][col]] for row in range(3)]
# [[1,2], [1], [1,2,3]]

Upvotes: 0

Related Questions