Functions to smooth a time series with known dips

Question

I have results of an Internet measurement experiment over time, as shown in the figure below. I am doing time series analysis in pandas. There are certain drops in the data, that are due to server outages. I am looking at good ways of smoothing the data.

Among the simpler built-in smoothing functions, pd.rolling_max() provides a reasonably good estimate. It however overestimates a little. I have also experimented with writing my own smoothing function, which carries forwards values when there is a >20% drop. This provides a reasonably good estimate too, but the threshold is set arbitrarily.

def my_smooth(win, thresh = 0.80):
    win = win.copy()
    for i, val in enumerate(win):
        if i > 1 and val < win[i-1] * thresh:
            win[i] = win[i-1]       
    return win[-1]

ts = pd.rolling_apply(ts, 6, my_smooth)

My question is, what are better smoothing functions for this type of time-series, given the specific characteristics? (i.e., it's count of events, and the major measurement errors are large under counts at specific times). Also, can my suggested smoothing function be made less adhoc or optimized?

enter image description here

Functions to smooth a time series with known dips

Answers (1)

Related Questions