abenhamou
abenhamou

Reputation: 41

How to interpolate/extrapolate within partly empty regular grid?

I would like to create a python function to linearly interpolate within a partly empty grid and get a nearest extrapolation out of bounds.

Let's say I have the following data stored in pandas DataFrame:

In [1]: import numpy as np
In [2]: import pandas as pd

In [3]: x = [0,1,2,3,4]
In [4]: y = [0.5,1.5,2.5,3.5,4.5,5.5]
In [5]: z = np.array([[np.nan,np.nan,1.5,2.0,5.5,3.5],[np.nan,1.0,4.0,2.5,4.5,3.0],[2.0,0.5,6.0,1.5,3.5,np.nan],[np.nan,1.5,4.0,2.0,np.nan,np.nan],[np.nan,np.nan,2.0,np.nan,np.nan,np.nan]])
In [6]: df = pd.DataFrame(z,index=x,columns=y)
In [7]: df
Out[7]:
    0.5  1.5  2.5  3.5  4.5  5.5
 0  NaN  NaN  1.5  2.0  5.5  3.5
 1  NaN  1.0  4.0  2.5  4.5  3.0
 2  2.0  0.5  6.0  1.5  3.5  NaN
 3  NaN  1.5  4.0  2.0  NaN  NaN
 4  NaN  NaN  2.0  NaN  NaN  NaN 

I would like to get function myInterp that returns a linear interpolation within data boundaries (i.e. not NaN values) and get a nearest extrapolation outside bounds (i.e. NaN or no values) such as:

In [1]: myInterp([1.5,2.5]) #linear interpolation
Out[1]: 5.0

In [2]: myInterp([1.5,4.0]) #bi-linear interpolation
Out[2]: 3.0

In [3]: myInterp([0.0,2.0]) #nearest extrapolation (inside grid)
Out[3]: 1.5

In [4]: myInterp([5.0,2.5]) #nearest extrapolation (outside grid)
Out[4]: 2.0

I tried many combination of scipy.interpolate package with no success, does anyone have a suggestion how to do it ?

Upvotes: 0

Views: 2103

Answers (2)

abenhamou
abenhamou

Reputation: 41

Since scipy.interp2d doesn't deal with Nans, the solution is to fill the NaNs in the DataFrame before using interp2d. This can be done by using pandas.interpolate function.

In the previous example, the following provide the desired output:

In [1]: from scipy.interpolate import interp2d

In [2]: df = df.interpolate(limit_direction='both',axis=1,inplace=True)
In [3]: myInterp = interp2d(df.index,df.columns,df.values.T)

In [4]: myInterp(1.5,2.5)
Out[4]: array([5.])

In [5]: myInterp(1.5,4.0)
Out[5]: array([3.])

In [6]: myInterp(0.0,2.0)
Out[6]: array([1.5])

In [7]: myInterp(5.0,2.5)
Out[7]: array([2.])

Upvotes: 0

SpghttCd
SpghttCd

Reputation: 10880

Yes, unfortunately scipy doesn't deal with nans

From the docs:

Note that calling interp2d with NaNs present in input values results in undefined behaviour.

Even masking the nans in a np.masked_array was not successful.

So my advice would be to remove all the nan entries from z by taking the opportunity to give sp.interp2d the full list of x- and y-coordinates for only the valid data and leave z also 1D:

X=[];Y=[];Z=[]                     # initialize new 1-D-lists for interp2
for i, xi in enumerate(x):         # iterate through x
    for k, yk in enumerate(y):     # iterate through y
        if not np.isnan(z[i, k]):  # check if z-value is valid...
            X.append(xi)           # ...and if so, append coordinates and value to prepared lists
            Y.append(yk)
            Z.append(z[i, k])

This way at least sp.interp2d works and gives a result:

ip = sp.interpolate.interp2d(X,Y,Z)

However, the values in the result won't please you:

In: ip(x,y)
Out: 
array([[ 18.03583061,  -0.44933642,   0.83333333,  -1.        , -1.46105542],
       [  9.76791531,   1.3014037 ,   2.83333333,   1.5       ,  0.26947229],
       [  1.5       ,   3.05214381,   4.83333333,   4.        ,   2.        ],
       [  2.        ,   3.78378051,   1.5       ,   2.        ,   0.8364618 ],
       [  5.5       ,   3.57039277,   3.5       ,  -0.83019815,  -0.7967441 ],
       [  3.5       ,   3.29227922,  17.29607177,   0.        ,   0.        ]])

compared to the input data:

In:z
Out: 
array([[ nan,  nan,  1.5,  2. ,  5.5,  3.5],
       [ nan,  1. ,  4. ,  2.5,  4.5,  3. ],
       [ 2. ,  0.5,  6. ,  1.5,  3.5,  nan],
       [ nan,  1.5,  4. ,  2. ,  nan,  nan],
       [ nan,  nan,  2. ,  nan,  nan,  nan]])

But IMHO this is because the gradient changes in your data are far too high. Even more with respect to the low number of data samples.

I hope this is just a test data set and your real application has smoother gradients and some more samples. Then I'd be glad to hear if it works...

However, the trivial test with an array of zero gradient - only destructed by nans a little bit - could give a hint that interpolation should work, while extrapolation is only partly correct:

In:ip(x,y)
Out: 
array([[ 3.        ,  3.        ,  3.        ,  3.        ,  0.        ],
       [ 3.        ,  3.        ,  3.        ,  3.        ,  1.94701008],
       [ 3.        ,  3.        ,  3.        ,  3.        ,  3.        ],
       [ 3.        ,  3.        ,  3.        ,  3.        ,  1.54973345],
       [ 3.        ,  3.        ,  3.        ,  3.        ,  0.37706713],
       [ 3.        ,  3.        ,  2.32108317,  0.75435203,  0.        ]])

resulting from the trivial test input

In:z
Out: 
array([[ nan,  nan,   3.,   3.,   3.,   3.],
       [ nan,   3.,   3.,  nan,   3.,   3.],
       [  3.,   3.,   3.,   3.,   3.,  nan],
       [ nan,   3.,   3.,   3.,  nan,  nan],
       [ nan,  nan,   3.,  nan,  nan,  nan]])

PS: Looking closer to the right hand side: there are even valid entries completely changed, i.e made wrong, which introduces errors in a following analysis.

But surprise: the cubic version performs much better here:

In:ip = sp.interpolate.interp2d(X,Y,Z, kind='cubic')

In:ip(x,y)
Out: 
array([[ 3.        ,  3.        ,  3.        ,  3.02397028,  3.0958811 ],
       [ 3.        ,  3.        ,  3.        ,  3.        ,  3.        ],
       [ 3.        ,  3.        ,  3.        ,  3.        ,  3.        ],
       [ 3.        ,  3.        ,  3.        ,  3.        ,  3.        ],
       [ 3.        ,  3.        ,  3.        ,  2.97602972,  2.9041189 ],
       [ 3.        ,  3.        ,  3.        ,  2.9041189 ,  2.61647559]])

In:z
Out: 
array([[ nan,  nan,   3.,   3.,   3.,   3.],
       [ nan,   3.,   3.,  nan,   3.,   3.],
       [  3.,   3.,   3.,   3.,   3.,  nan],
       [ nan,   3.,   3.,   3.,  nan,  nan],
       [ nan,  nan,   3.,  nan,  nan,  nan]])

Upvotes: 1

Related Questions