Reputation: 12509
I have a set of data points over time, but there is some missing data and the data is not at regular intervals. In order to get a full data set over time at regular intervals I did the following:
import pandas as pd
import numpy as np
from scipy import interpolate
x = data['time']
y = data['shares']
f = interpolate.interp1d(x, y, fill_value='extrapolate')
time = np.arange(0, 3780060, 600)
new_data = []
for interval in time:
new_data.append(f(interval))
test = pd.DataFrame({'time': time, 'shares': y})
test_func = test_func.astype(float)
When both the original and the extrapolated data sets are plotted, they seem to line up almost perfectly, but I still wonder if there is a more efficient and/or accurate way to accomplish the above.
Upvotes: 0
Views: 104
Reputation: 487
You should apply interpolation function only once, like this
new_data = f(time)
If you need values at regular intervals fill_value='extrapolate' is redundant, because it is just interpolation. You may use 'extrapolate' if your new interval is wider than original one. But it is bad practice.
Upvotes: 1