kanerm
kanerm

Reputation: 11

Return value of highest index in numpy 3D array

I have a 3D array in numpy that includes nans. I need to return the value with the greatest index position along the 0 axis. The answer would reduce to a 2D array.

There are a lot of questions about finding the index position of a maximum value along an axis (How to get the index of a maximum element in a numpy array along one axis), but that is different than what I need.

Example 3D array:

>>> import numpy as np
>>> foo = np.asarray([[[7,4,6],[4,2,11], [7,8,9], [4,8,2]],[[1,2,3],[np.nan,5,8], [np.nan,np.nan,10], [np.nan,np.nan,7]]])
>>> foo
array([[[  7.,   4.,   6.],
        [  4.,   2.,  11.],
        [  7.,   8.,   9.],
        [  4.,   8.,   2.]],

       [[  1.,   2.,   3.],
        [ nan,   5.,   8.],
        [ nan,  nan,  10.],
        [ nan,  nan,   7.]]])

I thought I was getting close using np.where but it returns all elements that are not nan. Not quite what I need because I want a (4,3) array.

>>> zoo = foo[np.where(~np.isnan(foo))]
>>> zoo
array([  7.,   4.,   6.,   4.,   2.,  11.,   7.,   8.,   9.,   4.,   8.,
     2.,   1.,   2.,   3.,   5.,   8.,  10.,   7.])

The answer I need is:

>>> ans = np.asarray([[1,2,3], [4,5,8], [7,8,10], [4,8,7]])
>>> ans
array([[ 1,  2,  3],
       [ 4,  5,  8],
       [ 7,  8, 10],
       [ 4,  8,  7]])

EDIT: I edited the foo example array to make the question more clear.

Upvotes: 1

Views: 627

Answers (2)

B. M.
B. M.

Reputation: 18668

A vectored solution, only with indices:

def last_non_nan(foo):
    i = np.isnan(foo)[::-1].argmin(0)
    j,k = np.indices(foo[0].shape)
    return foo[-i-1,j,k]

i contains the index of the first not nan number in the reversed 'line'. so -i-1 is its index in the direct line.

>>> last_non_nan(foo):
  [[  1.,   2.,   3.],
   [  4.,   5.,   8.],
   [  7.,   8.,  10.],
   [  4.,   8.,   7.]]

Faster than highest_index:

In [5]%timeit last_non_nan(foo)
133 µs ± 29.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: %timeit np.apply_along_axis(highest_index,0,foo)
667 µs ± 90 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

up to 150x (40 ms vs 6 s) faster for a (10,400,400) array with 90 % nans.

It s essentially because last_non_nan just fetch the last non nan value in each line, when highest_index compute the index and fetch all non nan values.

Upvotes: 0

You can use np.nanmax:

>>> np.nanmax(foo, axis=0)
array([[ 7.,  4.,  6.],
       [ 4.,  5., 11.],
       [ 7.,  8., 10.],
       [ 4.,  8.,  7.]])

The np.nanmax function returns the maximum of an array or maximum along an axis, ignoring any NaNs.

EDIT

As you rightly point out in your comment, you need the value at the maximum index and the code above doesn't return that.

Instead, you can use apply-along-axis:

>>> def highest_index(a):
...     return a[~np.isnan(a)][-1] # return non-nan value at highest index

>>> np.apply_along_axis(highest_index, 0, foo)
array([[ 1.  2.  3.]
       [ 4.  5.  8.]
       [ 7.  8. 10.]
       [ 4.  8.  7.]])

Upvotes: 1

Related Questions