Reputation:
I have a 1 dimensional array A of floats that is mostly good but a few of the values are missing. Missing data is replace with nan(not a number). I have to replace the missing values in the array by linear interpolation from the nearby good values. So, for example:
F7(np.array([10.,20.,nan,40.,50.,nan,30.]))
should return
np.array([10.,20.,30.,40.,50.,40.,30.]).
What's the best of way of doing this using Python?
Any help would be much appreciated
Thanks
Upvotes: 11
Views: 28017
Reputation: 94
To not create new Series object or new items in Series every time you want to interpolate data use RedBlackPy. See code example below:
import redblackpy as rb
# we do not include missing data
index = [0,1,3,4,6]
data = [10,20,40,50,30]
# create Series object
series = rb.Series(index=index, values=data, dtype='float32',
interpolate='linear')
# Now you have access at any key using linear interpolation
# Interpolation does not creates new items in Series
print(series[2]) # prints 30
print(series[5]) # prints 40
# print Series and see that keys 2 and 5 do not exist in series
print(series)
The last output is following:
Series object Untitled
0: 10.0
1: 20.0
3: 40.0
4: 50.0
6: 30.0
Upvotes: 0
Reputation: 363487
You could use scipy.interpolate.interp1d
:
>>> from scipy.interpolate import interp1d
>>> import numpy as np
>>> x = np.array([10., 20., np.nan, 40., 50., np.nan, 30.])
>>> not_nan = np.logical_not(np.isnan(x))
>>> indices = np.arange(len(x))
>>> interp = interp1d(indices[not_nan], x[not_nan])
>>> interp(indices)
array([ 10., 20., 30., 40., 50., 40., 30.])
EDIT: it took me a while to figure out how np.interp
works, but that can do the job as well:
>>> np.interp(indices, indices[not_nan], x[not_nan])
array([ 10., 20., 30., 40., 50., 40., 30.])
Upvotes: 16
Reputation: 80336
I would go with pandas
. A minimalistic approach with a oneliner:
from pandas import *
a=np.array([10.,20.,nan,40.,50.,nan,30.])
Series(a).interpolate()
Out[219]:
0 10
1 20
2 30
3 40
4 50
5 40
6 30
Or if you want to keep it as an array:
Series(a).interpolate().values
Out[221]:
array([ 10., 20., 30., 40., 50., 40., 30.])
Upvotes: 9