Cerin
Cerin

Reputation: 64709

How to convert an array index to/from a mask

Say I have an array like:

a1 = np.array([.1, .2, 23., 4.3, 3.2, .1, .05, .2, .3, 4.2, 7.6])

And I filter out, and create a mask, of all values less than 1, like:

a2 = a1[a1 >= 1]
a2_mask = np.ma.masked_where(a1 < 1, a1)

And then search for a specific value:

a2_idx = np.where(a2==3.2)[0][0]

How would I convert that index to the corresponding index in the original array?

e.g.

>>> a2_idx
2
>>> a1_idx = reframe_index(a2_idx, a2_mask)
>>> a1_idx
4

My naive implementation would be:

def reframe_index(old_idx, mask):
    cnt = 0
    ref = 0
    for v in mask:
        if not isinstance(v, (int, float)):
            cnt += 1
        else:
            if ref == old_idx:
                return ref + cnt
            ref += 1

Does Numpy have a more efficient way to do this?

Upvotes: 1

Views: 251

Answers (2)

Mad Physicist
Mad Physicist

Reputation: 114230

I had a similar problem recently, so I made haggis.npy_util.unmasked_index1. This function has a lot of overkill for your relatively simple case, because it's intended to operate on an arbitrary number of dimensions. That being said, given

>>> arr = np.array([.1, .2, 23., 4.3, 3.2, .1, .05, .2, .3, 4.2, 7.6])

and

>>> mask = arr >= 1
>>> mask
array([False, False,  True,  True,  True, False, False, False, False,
       True,  True])

You can do something like

>>> idx = unmasked_index(np.flatnonzero(arr[mask] == 3.2), mask)
>>> idx
array([4])

If you ever need it, there is also an inverse function haggis.npy_util.masked_index that converts a location in a multidimensional input array into its index in the masked array.

1Disclaimer: I am the author of haggis.

Upvotes: 0

hpaulj
hpaulj

Reputation: 231335

a2 is a copy, so there's no link between it an a1 - except for some values.

In [19]: a2
Out[19]: array([23. ,  4.3,  3.2,  4.2,  7.6])
In [20]: np.nonzero(a2 == 3.2)
Out[20]: (array([2]),)
In [21]: a2[2]
Out[21]: 3.2

The mask of a2_mask, just a1<1, does give us a way of finding the corresponding element of a1:

In [22]: a2_mask = np.ma.masked_where(a1 < 1, a1)
In [23]: a2_mask
Out[23]: 
masked_array(data=[--, --, 23.0, 4.3, 3.2, --, --, --, --, 4.2, 7.6],
             mask=[ True,  True, False, False, False,  True,  True,  True,
                    True, False, False],
       fill_value=1e+20)
In [24]: a2_mask.compressed()
Out[24]: array([23. ,  4.3,  3.2,  4.2,  7.6])
In [25]: a2_mask.mask
Out[25]: 
array([ True,  True, False, False, False,  True,  True,  True,  True,
       False, False])
In [26]: np.nonzero(~a2_mask.mask)
Out[26]: (array([ 2,  3,  4,  9, 10]),)
In [27]: np.nonzero(~a2_mask.mask)[0][2]
Out[27]: 4
In [28]: a1[4]
Out[28]: 3.2

So you need the mask or indices used to select a2 in the first place. a2 itself does not have the information.

In [30]: np.nonzero(a1>=1)
Out[30]: (array([ 2,  3,  4,  9, 10]),)
In [31]: np.nonzero(a1 >= 1)[0][2]
Out[31]: 4

Upvotes: 1

Related Questions