Reputation: 64709
Say I have an array like:
a1 = np.array([.1, .2, 23., 4.3, 3.2, .1, .05, .2, .3, 4.2, 7.6])
And I filter out, and create a mask, of all values less than 1, like:
a2 = a1[a1 >= 1]
a2_mask = np.ma.masked_where(a1 < 1, a1)
And then search for a specific value:
a2_idx = np.where(a2==3.2)[0][0]
How would I convert that index to the corresponding index in the original array?
e.g.
>>> a2_idx
2
>>> a1_idx = reframe_index(a2_idx, a2_mask)
>>> a1_idx
4
My naive implementation would be:
def reframe_index(old_idx, mask):
cnt = 0
ref = 0
for v in mask:
if not isinstance(v, (int, float)):
cnt += 1
else:
if ref == old_idx:
return ref + cnt
ref += 1
Does Numpy have a more efficient way to do this?
Upvotes: 1
Views: 251
Reputation: 114230
I had a similar problem recently, so I made haggis.npy_util.unmasked_index
1. This function has a lot of overkill for your relatively simple case, because it's intended to operate on an arbitrary number of dimensions. That being said, given
>>> arr = np.array([.1, .2, 23., 4.3, 3.2, .1, .05, .2, .3, 4.2, 7.6])
and
>>> mask = arr >= 1
>>> mask
array([False, False, True, True, True, False, False, False, False,
True, True])
You can do something like
>>> idx = unmasked_index(np.flatnonzero(arr[mask] == 3.2), mask)
>>> idx
array([4])
If you ever need it, there is also an inverse function haggis.npy_util.masked_index
that converts a location in a multidimensional input array into its index in the masked array.
1Disclaimer: I am the author of haggis.
Upvotes: 0
Reputation: 231335
a2
is a copy, so there's no link between it an a1
- except for some values.
In [19]: a2
Out[19]: array([23. , 4.3, 3.2, 4.2, 7.6])
In [20]: np.nonzero(a2 == 3.2)
Out[20]: (array([2]),)
In [21]: a2[2]
Out[21]: 3.2
The mask
of a2_mask
, just a1<1
, does give us a way of finding the corresponding element of a1
:
In [22]: a2_mask = np.ma.masked_where(a1 < 1, a1)
In [23]: a2_mask
Out[23]:
masked_array(data=[--, --, 23.0, 4.3, 3.2, --, --, --, --, 4.2, 7.6],
mask=[ True, True, False, False, False, True, True, True,
True, False, False],
fill_value=1e+20)
In [24]: a2_mask.compressed()
Out[24]: array([23. , 4.3, 3.2, 4.2, 7.6])
In [25]: a2_mask.mask
Out[25]:
array([ True, True, False, False, False, True, True, True, True,
False, False])
In [26]: np.nonzero(~a2_mask.mask)
Out[26]: (array([ 2, 3, 4, 9, 10]),)
In [27]: np.nonzero(~a2_mask.mask)[0][2]
Out[27]: 4
In [28]: a1[4]
Out[28]: 3.2
So you need the mask or indices used to select a2
in the first place. a2
itself does not have the information.
In [30]: np.nonzero(a1>=1)
Out[30]: (array([ 2, 3, 4, 9, 10]),)
In [31]: np.nonzero(a1 >= 1)[0][2]
Out[31]: 4
Upvotes: 1