How to get the distance to the closest previous finite number in a row using Numpy

Question

I'm stuck at something that I think could easily be solved in a couple of lines using Numpy, I just don't see it. Let's define an example array containing some missing values:

import numpy as np
input_data = np.array([[1,3,5,8,6],[3,np.nan,np.nan,5,6],[np.nan,6,7,np.nan,2]])

Out[530]: [[1, 3, 5, 8, 6], [3, nan, nan, 5, 6], [nan, 6, 7, nan, 2]]

What I'm looking for is to get an array that gives me for each element the distance to the previous valid value in each row. In the example above, this would be something like:

delta_valid = [[nan, 1, 1, 1, 1], [nan, 1, 2, 3, 1], [nan, nan, 1, 1, 2]]

The first element in each row would always be NaN because there is no previous value (not sure if there's a better way to define this).

Who can help me getting this result in Numpy? Thank you very much!

Divakar · Accepted Answer

You are basically make ranges of (1,2,3,...) until the next non-NaN. To solve such cases, we could use some diff + cumsum magic on each row, as shown below -

def closest_distance_per_row(a):
    m0 = np.ones(a.shape,dtype=int)
    mask = ~np.isnan(a)
    for i,item in enumerate(a):
        idx = np.flatnonzero(mask[i])
        if len(idx)>0:
            m0[i,:idx[0]] = 0
            m0[i,idx[1:]] = idx[:-1] - idx[1:] +1

    out = np.full(a.shape,np.nan,dtype=float)
    out[:,1:] = m0[:,:-1].cumsum(1)
    out[out==0] = np.nan
    out[~mask.any(1)] = np.nan
    return out

Sample runs -

In [353]: a
Out[353]: 
array([[  1.,   3.,   5.,   8.,   6.],
       [  3.,  nan,  nan,   5.,   6.],
       [ nan,   6.,   7.,  nan,   2.]])

In [354]: closest_distance_per_row(a)
Out[354]: 
array([[ nan,   1.,   1.,   1.,   1.],
       [ nan,   1.,   2.,   3.,   1.],
       [ nan,  nan,   1.,   1.,   2.]])

In [343]: a
Out[343]: 
array([[ nan,  nan,  nan,  nan,  nan,  nan,   4.,  nan,   3.,   1.],
       [ nan,  nan,   6.,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [  0.,  nan,   2.,  nan,   1.,  nan,   0.,  nan,  nan,  nan],
       [  3.,  nan,   2.,  nan,   8.,   6.,  nan,   4.,   2.,  nan],
       [ nan,   0.,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,   2.,  nan,   0.,  nan,  nan,   1.,  nan,  nan]])

In [344]: closest_distance_per_row(a)
Out[344]: 
array([[ nan,  nan,  nan,  nan,  nan,  nan,  nan,   1.,   2.,   1.],
       [ nan,  nan,  nan,   1.,   2.,   3.,   4.,   5.,   6.,   7.],
       [ nan,   1.,   2.,   1.,   2.,   1.,   2.,   1.,   2.,   3.],
       [ nan,   1.,   2.,   1.,   2.,   1.,   1.,   2.,   1.,   1.],
       [ nan,  nan,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.],
       [ nan,  nan,  nan,   1.,   2.,   1.,   2.,   3.,   1.,   2.]])

Runtime test -

In [4]: a = np.random.randint(0,9,(5000,5000)).astype(float)

In [5]: a.ravel()[np.random.choice(a.size, int(a.size*0.5), replace=0)] = np.nan

In [6]: %timeit two_loops(a)
1 loops, best of 3: 16.7 s per loop

In [7]: %timeit closest_distance_per_row(a)
1 loops, best of 3: 339 ms per loop

In [8]: 16700/339.0 # Speedup with one loop (proposed in this post) over two loops
Out[8]: 49.26253687315634

How to get the distance to the closest previous finite number in a row using Numpy

Answers (2)

Related Questions