Selecting values in ndarray occuring after a NaN

Question

I have a large 2D ndarray of floats, call it ar. It contains some NaNs. I am interested in the immediate neighbors of the NaNs to the right (eg. along axis=1). For example, if I know that say point (3, 7) is a NaN, I want to select ar[3, 8:8+N]. I then want to repeat for all locations of the NaNs, and vstack all the slices thus obtained.

I can locate the NaNs with np.where happily, and do a for loop over the values. Sadly, that's a bit slow. Is there an efficient way to do the indexing in a vectorised fashion? So I have a list of tuples (x, y), and I want to get more-or-less,

result=np.vstack([ ar[x, y+1:y+1+N] for x, y, in tuples ])

just without the looping. Is that possible?

Many thanks in advance.

Jaime · Accepted Answer

What you are asking for is ill defined if a nan happens less than N columns from the edge, but the following should work:

rows, cols = np.where(np.isnan(ar))
cols = (cols[:, None] + np.arange(1, N+1)).reshape(-1)
# Handle indices out of range by repeating the last column
cols = np.clip(cols, 0, ar.shape[1] - 1)
rows = np.repeat(rows, N)
result = ar[rows, cols].reshape(-1, 2)

Making up some fake data:

>>> ar = np.random.rand(25)
>>> ar[np.random.randint(25, size=5)] = np.nan
>>> ar = ar.reshape(5, 5)
>>> N = 2

and running the above code on it yields:

>>> ar
array([[ 0.96556647,         nan,  0.02934316,  0.82174232,  0.29293098],
       [ 0.34819313,  0.57449136,         nan,         nan,  0.32791866],
       [ 0.14020414,  0.60668458,  0.95613773,  0.09533064,  0.43401037],
       [ 0.83888255,  0.34240687,         nan,  0.02495232,  0.36234979],
       [ 0.21870906,  0.24181006,  0.81447603,  0.24216213,         nan]])
>>> result
array([[ 0.02934316,  0.82174232],
       [        nan,  0.32791866],
       [ 0.32791866,  0.32791866],
       [ 0.02495232,  0.36234979],
       [        nan,         nan]])

Selecting values in ndarray occuring after a NaN

Answers (1)

Related Questions