S.EB
S.EB

Reputation: 2226

Masking out some rows of numpy array and recover back

I have a mask with a mask_re:(8781288, 1) including ones and zeros, label file (y_lbl:(8781288, 1)) and a feature vector with feat_re: (8781288, 64). I need to take only those rows from feature vector and label files that are 1 in the mask file. how can I do this, and how can apply the opposite action of putting (recovering back) prediction values (ypred) in the masked_label file based on the mask file in the elements that are one?

For example in Matlab can be done easily X=feat_re(mask_re==1) and can be recovered back new_lbl(mask_re==1)=ypred, where new_lbl=zeros(8781288, 1). I tried to do a similar thing in python:

 X=feat_re[np.where(mask_re==1),:]
 X.shape
(2, 437561, 64)

EDITED (SOLVED) According to what @hpaulj suggested

The problem was with the shape of my mask file, once I changed it to mask_new=mask_re.reshape((8781288)), it solved my issue, and then

X=feat_re[mask_new==1,:]
(437561, 64)

Upvotes: 0

Views: 409

Answers (2)

Dev Khadka
Dev Khadka

Reputation: 5451

you can use boolean indexing for masking like below

X = feat_re[mask_re==1, :]
X = X.reshape(2, -1, 64)

this selects rows of feat_re where (mask_re==1) is True. Then you can reshape x using reshape function. you can again use reshape to get back to same array shape. "-1" in reshape indicate the size need to be calculated by numpy

Upvotes: 0

hpaulj
hpaulj

Reputation: 231385

In [182]: arr = np.arange(12).reshape(3,4)                                      
In [183]: mask = np.array([1,0,1], bool)                                        
In [184]: arr[mask,:]                                                           
Out[184]: 
array([[ 0,  1,  2,  3],
       [ 8,  9, 10, 11]])
In [185]: new = np.zeros_like(arr)                                              
In [186]: new[mask,:] = np.array([10,12,14,16])                                 
In [187]: new                                                                   
Out[187]: 
array([[10, 12, 14, 16],
       [ 0,  0,  0,  0],
       [10, 12, 14, 16]])

I suspect your error comes from the shape of mask:

In [188]: mask1 = mask[:,None]                                                  
In [189]: mask1.shape                                                           
Out[189]: (3, 1)
In [190]: arr[mask1,:]                                                          
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-190-6317c3ea0302> in <module>
----> 1 arr[mask1,:]

IndexError: too many indices for array

Remember, numpy can have 1d and 0d arrays; it doesn't force everything to be 2d.

With where (aka nonzero):

In [191]: np.nonzero(mask)                                                      
Out[191]: (array([0, 2]),)     # 1 element tuple
In [192]: np.nonzero(mask1)                                                     
Out[192]: (array([0, 2]), array([0, 0]))    # 2 element tuple
In [193]: arr[_191]            # using the mask index                                                  
Out[193]: 
array([[ 0,  1,  2,  3],
       [ 8,  9, 10, 11]])

Upvotes: 1

Related Questions