Aditya369
Aditya369

Reputation: 834

python numpy get masked data without flattening

How do I get the masked data only without flattening the data into a 1D array? That is, suppose I have a numpy array

a = np.array([[0, 1, 2, 3],
              [0, 1, 2, 3],
              [0, 1, 2, 3]])

and I mask all elements greater than 1,

b = ma.masked_greater(a, 1)

masked_array(data =
 [[0 1 -- --]
 [0 1 -- --]
 [0 1 -- --]],
             mask =
 [[False False  True  True]
 [False False  True  True]
 [False False  True  True]],
       fill_value = 999999)

How do I get only the masked elements without flattening the output? That is, I need to get

array([[ 2, 3],
       [2, 3],
       [2, 3]])

Upvotes: 2

Views: 4641

Answers (3)

hpaulj
hpaulj

Reputation: 231335

Lets try an example that produces a ragged result - different number of 'masked' values in each row.

In [292]: a=np.arange(12).reshape(3,4)
In [293]: a
Out[293]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [294]: a<6
Out[294]: 
array([[ True,  True,  True,  True],
       [ True,  True, False, False],
       [False, False, False, False]], dtype=bool)

The flattened list of values that match this condition. It can't return a regular 2d array, so it has to revert to a flattened array.

In [295]: a[a<6]
Out[295]: array([0, 1, 2, 3, 4, 5])

do the same thing, but iterating row by row

In [296]: [a1[a1<6] for a1 in a]
Out[296]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)]

Trying to make an array of the result produces an object type array, which is little more than a list in an array wrapper:

In [297]: np.array([a1[a1<6] for a1 in a])
Out[297]: array([array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)], dtype=object)

The fact that the result is ragged is a good indicator that it is difficult, if not impossible, to perform that action with one vectorized operation.


Here's another way of producing the list of arrays. With sum I find how many elements there are in each row, and then use this to split the flattened array into sublists.

In [320]: idx=(a<6).sum(1).cumsum()[:-1]
In [321]: idx
Out[321]: array([4, 6], dtype=int32)
In [322]: np.split(a[a<6], idx)
Out[322]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=float64)]

It does use 'flattening'. And for these small examples it is slower than the row iteration. (Don't worry about the empty float array, split had to construct something and used a default dtype. )


A different mask, without empty rows clearly shows the equivalence of the 2 approaches.

In [344]: mask=np.tri(3,4,dtype=bool)  # lower tri
In [345]: mask
Out[345]: 
array([[ True, False, False, False],
       [ True,  True, False, False],
       [ True,  True,  True, False]], dtype=bool)
In [346]: idx=mask.sum(1).cumsum()[:-1]
In [347]: idx
Out[347]: array([1, 3], dtype=int32)
In [348]: [a1[m] for a1,m in zip(a,mask)]
Out[348]: [array([0]), array([4, 5]), array([ 8,  9, 10])]
In [349]: np.split(a[mask],idx)
Out[349]: [array([0]), array([4, 5]), array([ 8,  9, 10])]

Upvotes: 2

Martin Konecny
Martin Konecny

Reputation: 59571

Zip the two lists together, and then filter them out:

data = [[0, 1, 1, 1], [0, 1, 1, 1], [0, 1, 1, 1]]

mask = [[False, False,  True,  True],
 [False, False,  True,  True],
 [False, False,  True,  True]]

zipped = zip(data, mask) # [([0, 1, 1, 1], [False, False, True, True]), ([0, 1, 1, 1], [False, False, True, True]), ([0, 1, 1, 1], [False, False, True, True])]

masked = []
for lst, mask in zipped:
    pairs = zip(lst, mask)  # [(0, False), (1, False), (1, True), (1, True)]
    masked.append([num for num, b in pairs if b])

print(masked)  # [[1, 1], [1, 1], [1, 1]]

or more succinctly:

zipped = [...]
masked = [[num for num, b in zip(lst, mask) if b] for lst, mask in zipped]
print(masked)  # [[1, 1], [1, 1], [1, 1]]

Upvotes: 1

Greg Nisbet
Greg Nisbet

Reputation: 6994

Due to vectorization in numpy you can use np.where to select items from the first array and use None (or some arbitrary value) to indicate the places that a value has been masked out. Note that this means you have to use a less compact representation for the array so may want to use -1 or some special value.

import numpy as np

a = np.array([
    [0, 1, 2, 3],
    [0, 1, 2, 3],
    [0, 1, 2, 3]])

mask = np.array([[ True,  True,  True,  True],
    [ True, False,  True,  True],
    [False,  True,  True, False]])

np.where(a, np.array, None)

This produces

array([[0, 1, 2, 3],
   [0, None, 2, 3],
   [None, 1, 2, None]], dtype=object)

Upvotes: 1

Related Questions