TheNamekian
TheNamekian

Reputation: 45

How to plot a smooth curve using interp1d for time-series data?

I have the following dataframe, called new_df:

     period1  intercept     error
0   2018-01-10 -33.707010  0.246193
1   2018-01-11 -36.151656  0.315618
2   2018-01-14 -37.846709  0.355960
3   2018-01-20 -37.170161  0.343631
4   2018-01-26 -31.785060  0.350386
..         ...        ...       ...
121 2020-05-03 -37.654889  0.489900
122 2020-05-06 -36.575763  0.559362
123 2020-06-10 -39.084314  0.756743
124 2020-06-11 -36.240442  0.705487
125 2020-06-14 -45.530748  0.991380

I am trying to plot a smooth curve (spline) with 'period1' on x-axis and 'intercept' on the y. Plotting this normally, without any interpolation I get:

enter image description here

To smooth this curve, I have tried the following using interp1d function from scipy:

from matplotlib import dates
from scipy.interpolate import interp1d
import numpy as np
import matplotlib.plt as plt

x = new_df.period1.values # convert period1 column to a numpy array
y = new_df.intercept.values # convert the intercept column to a numpy array
x_dates = np.array([dates.date2num(i) for i in x]) # period1 values are datetime objects, this line converts them to numbers

f = interp1d(x_dates, y, kind = 'cubic')
x_smooth = np.linspace(x_dates.min(), x_dates.max(), endpoint = True) # unsure if this line is right?

plt.plot(x_dates, y, 'o', x_smooth, f(x_smooth),'--')
plt.xlabel('Date')
plt.ylabel('Intercept')
plt.legend(['data', 'cubic spline'], loc = 'lower right')
plt.show()

This gives the output:

enter image description here

Which is not the correct smooth curve I'm trying to get. Is there something I am doing wrong somewhere? Also how can I revert the xticks back to dates?

NB. There isn't a fixed interval between the dates in the period1 column and they're completely radnom

Any help is appreciated. Thanks!

Upvotes: 0

Views: 1529

Answers (1)

Richard
Richard

Reputation: 3394

Instead of interpolation (or perhaps use in addition to) try using data-smoothing (ie 'convolution').

The basic concept is simple - replace the value at a point t, with the average value of that point, and the ones around it.

What this will do is remove the noise between adjacent points, and make the plot more look like the overall trend in the data.

While it's easy to write this yourself, or use numpy convolve, there is a specialized method in scipy for this: savgol_filter that offers a few helpful features out of the box.

savgol_filter is in scipy.signal so you could check out the examples there.

Upvotes: 1

Related Questions