Reputation: 11
I have a 3D array in numpy that includes nans. I need to return the value with the greatest index position along the 0 axis. The answer would reduce to a 2D array.
There are a lot of questions about finding the index position of a maximum value along an axis (How to get the index of a maximum element in a numpy array along one axis), but that is different than what I need.
Example 3D array:
>>> import numpy as np
>>> foo = np.asarray([[[7,4,6],[4,2,11], [7,8,9], [4,8,2]],[[1,2,3],[np.nan,5,8], [np.nan,np.nan,10], [np.nan,np.nan,7]]])
>>> foo
array([[[ 7., 4., 6.],
[ 4., 2., 11.],
[ 7., 8., 9.],
[ 4., 8., 2.]],
[[ 1., 2., 3.],
[ nan, 5., 8.],
[ nan, nan, 10.],
[ nan, nan, 7.]]])
I thought I was getting close using np.where but it returns all elements that are not nan. Not quite what I need because I want a (4,3)
array.
>>> zoo = foo[np.where(~np.isnan(foo))]
>>> zoo
array([ 7., 4., 6., 4., 2., 11., 7., 8., 9., 4., 8.,
2., 1., 2., 3., 5., 8., 10., 7.])
The answer I need is:
>>> ans = np.asarray([[1,2,3], [4,5,8], [7,8,10], [4,8,7]])
>>> ans
array([[ 1, 2, 3],
[ 4, 5, 8],
[ 7, 8, 10],
[ 4, 8, 7]])
EDIT: I edited the foo example array to make the question more clear.
Upvotes: 1
Views: 627
Reputation: 18668
A vectored solution, only with indices:
def last_non_nan(foo):
i = np.isnan(foo)[::-1].argmin(0)
j,k = np.indices(foo[0].shape)
return foo[-i-1,j,k]
i
contains the index of the first not nan number in the reversed 'line'.
so -i-1
is its index in the direct line.
>>> last_non_nan(foo):
[[ 1., 2., 3.],
[ 4., 5., 8.],
[ 7., 8., 10.],
[ 4., 8., 7.]]
Faster than highest_index
:
In [5]%timeit last_non_nan(foo)
133 µs ± 29.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [6]: %timeit np.apply_along_axis(highest_index,0,foo)
667 µs ± 90 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
up to 150x (40 ms vs 6 s) faster for a (10,400,400) array with 90 % nans.
It s essentially because last_non_nan
just fetch the last non nan value in each line, when highest_index
compute the index and fetch all non nan values.
Upvotes: 0
Reputation: 3001
You can use np.nanmax
:
>>> np.nanmax(foo, axis=0)
array([[ 7., 4., 6.],
[ 4., 5., 11.],
[ 7., 8., 10.],
[ 4., 8., 7.]])
The np.nanmax
function returns the maximum of an array or maximum along an axis, ignoring any NaNs.
As you rightly point out in your comment, you need the value at the maximum index and the code above doesn't return that.
Instead, you can use apply-along-axis
:
>>> def highest_index(a):
... return a[~np.isnan(a)][-1] # return non-nan value at highest index
>>> np.apply_along_axis(highest_index, 0, foo)
array([[ 1. 2. 3.]
[ 4. 5. 8.]
[ 7. 8. 10.]
[ 4. 8. 7.]])
Upvotes: 1