SPS
SPS

Reputation: 475

Plot a derivative of a time series with a smoothed look in Python

I have a long pandas time series like this:

2017-11-27 16:19:00     120.0
2017-11-30 02:40:35     373.4
2017-11-30 02:40:42     624.5
2017-12-01 14:15:31     871.8
2017-12-01 14:15:33    1120.0
2017-12-07 21:07:04    1372.2
2017-12-08 06:11:50    1660.0
2017-12-08 06:11:53    1946.7
2017-12-08 06:11:57    2235.3
2017-12-08 06:12:00    2521.3
....
dtype: float64

and I want to plot it together with its derivative. By definition I calculate derivative in this manner:

numer=myTimeSeries.diff()
denominat=myTimeSeries.index.to_series().diff().dt.total_seconds()/3600
derivative=numer/denominat

Because some values of the delta time (that is in denominat) is very close (or equal sometimes) to zero I got some inf values in my derivative. Practically I got this:[]. [1]

Time series blue(left scale), derivative green (right scale)

Now I would like to smooth the derivative to make it more readable. I tried different operations, like:

set periods=5 for both numer and denominat

enter image description here

I used also different window types without any useful changes

to get this:

enter image description here

Practically I cannot find any useful improvement. What can you suggest me to improve the readability of the derivative plot on the chart, if it is possible. Obviously I'd cut some peak of the derivative to obtain a smoothed curve that approximate the true values. I tried different combination about the window types, periods, etc.. without any results. About the Kalman filter, I'm not an expert, let's say a newbie, so I just used default values following this. I've also found filterpy library which implements the Kalman filter but I've not found how to use without setting starting parameters.

Upvotes: 3

Views: 5552

Answers (2)

Abhishek Mishra
Abhishek Mishra

Reputation: 2004

We know that derivate of a function is defined as below:

f'(x) = lim_(h -> 0) (f(x + h) - f(x - h)) / 2h

Lets assume that the derivative of your function is defined every where. When h is very small, you will get a better approximation of derivative and when h is very large, you will get a bad approximation of the derivative.

There is a problem to apply this approach in case of your dataset. Sometime h can become very small to essentially give absurdly high value of gradient. Sometimes h is too large that the gradient estimate is very bad. To overcome this problem, lets define two threshold of time t1 and t2. If the successive time difference is between t1 and t2, then we use that point to determine the gradient by the above formula of f'(x). If it is beyond this threshold, we ignore that point.

How do we compute the gradient for rest of the points?

We can fit a polynomial based on the points that we found in the previous step.

Upvotes: 0

gyoza
gyoza

Reputation: 2152

If your goal is to remove "outlier" spikes in derivative series, I would try "rolling median" first instead of "rolling mean" since median in general is more insensitive to outliers.

For example:

smotDeriv = derivative.rolling(window=10, min_periods=3, center=True).median()

And then, if you further want to smooth it out, one of possible options is to apply rolling_mean().

Note: Since I don't have your data at hand to play around, I'm not sure about optimal values for window and min_periods. It depends on how far you want to smooth it out. Also, it seems to me that smoothing derivative is becoming more like smoothing the original time series, so if there is a known way to smooth your original time series, that may be more straight forward.

Hope this helps.

Upvotes: 2

Related Questions