Faster way of using interp1d in 2d array

Question

The results are correct. But in my real problem, the data are too large, so I want to directly apply interpolation with out using for loop. Any ideas would be appreciated.

import numpy as np
from scipy.interpolate import interp1d


data = np.array([[99,0,3,4,5],
               [6,7,0,9,10],
               [11,22,0,14,15]],dtype=np.float32)

data[data==0] = np.nan

def gap_fill(y):
    not_nan = ~np.isnan(y)
    x = np.arange(len(y))
    interp = interp1d(x[not_nan], y[not_nan], kind='linear')
    ynew = interp(x)
    return ynew

results = []
for d in data:
   gapfilled = gap_fill(d)
   results.append(gapfilled)
print results

[array([ 99.,  51.,   3.,   4.,   5.]), array([  6.,   7.,   8.,   9.,  10.]), array([ 11.,  22.,  18.,  14.,  15.])]

hpaulj · Accepted Answer

What I was thinking of, on the spur of the moment, was:

In [8]: gap_fill(data.flatten()).reshape(data.shape)
Out[8]: 
array([[ 99.,  51.,   3.,   4.,   5.],
       [  6.,   7.,   8.,   9.,  10.],
       [ 11.,  22.,  18.,  14.,  15.]])

That works for your example because all the nan are internal to the rows. However for elements on the ends of the rows, this turns extrapolation into interpolation across rows, which you probably don't want.

Strictly speaking linear interpolation is finding the value BETWEEN two points, (1-a)*x1+a*x2, where 0<=a<=1. If a is outside of that range, that's linear extrapolation.

The default action in interp1 is to raise an error in extrapolation cases. Since your iterative gap_fill runs, you must not have any extrapolation cases. In which case my flatten solution should work fine.

It doesn't look like interp1d uses any C code for liner interpolation. Also looking at its documentation, you might gain some speed by adding copy=False, assume_sorted=True.

Its core action is:

    slope = (y_hi - y_lo) / (x_hi - x_lo)[:, None]
    y_new = slope*(x_new - x_lo)[:, None] + y_lo

Faster way of using interp1d in 2d array

Answers (1)

Related Questions