Reputation: 822
I am trying to analyze the seasonality of the returns of a stock (but actually could be any kind of time series):
On the x axis we have the weeks and on the y axis the historical average return during each week. To better clarify, each dot represents the average return (y axix) of the stock during each of the 52 weeks (x axis); the average takes into account the last 20 years. I'm trying to use a polynomial model to denoise the data and get a smoother signal.
I know I can get polynomial coefficients with numpy.polyfit
numpy.polyfit( weeks , return , deg)
The problem is that, in the example above, the signal I get for week 52 (red circle) is completely different from the signal I get for the following week (green circle, which is week 1 of the following year). I'm trying to avoid these kind of jumps from the signal of the last week of the December to the signal of the first week of January. Is there a way to force polyfit to find coefficients that produce the same function result for two given input x values (in my case, 1 and 52)?
Otherwise, is there anything I could do with the data to mitigate this problem? One thing I tried is adding "fake weeks" before the first one (so I created week -9 to 0, which have the same Y values of weeks 43 to 52) and other fake weeks after the last one (so we have week 53 to 62, which have the same Y values of weeks 1 to 10). This seems to help but doesn't completely fix the problem. Any ideas? Thanks
Upvotes: 0
Views: 187
Reputation: 599
You could fit with splines and force periodicity at the end points?
from scipy.interpolate import BSpline, splrep
ave_return = np.array([
0.29549823, -0.04327911, -0.28475728, 0.24133149, 0.29175083,
0.05927994, -0.19481259, 0.0682162 , 0.12219757, 0.2537674 ,
0.24648395, 0.15455555, 0.27520195, -0.01664706, -0.47437987,
-0.01138717, -0.02216335, 0.0930811 , 0.61556973, 0.30738668,
0.30734683, 0.21362355, 0.13790445, -0.15041544, -0.37567391,
-0.06940527, -0.12529933, -0.26046757, -0.34338869, -0.3451905 ,
-0.02994229, -0.04620011, -0.03362213, 0.16813838, 0.20072505,
-0.22111894, -0.23910233, -0.29322923, -0.06443125, -0.07527673,
-0.25189341, -0.16183438, -0.07362219, -0.09708203, 0.00569532,
0.23257541, 0.07938912, 0.03610597, -0.23765742, -0.32248603,
0.04504569, -0.01805558, 0.03534886,
])
N = len(ave_return)
xx = np.linspace(0., 12., N)
t, c, k = splrep(xx, ave_return, s=.3, k=4,per=True)
spline = BSpline(t, c, k, extrapolate=False)
plt.plot(xx, ave_return, 'bo', label='Original points')
plt.plot(xx, spline(xx), 'r', label='BSpline')
plt.grid()
plt.legend(loc='best')
plt.show()
Upvotes: 0
Reputation: 15293
This is not a job for polyfit
. Fundamentally your data represent a periodic process. One approach is to apply a real FFT, and then optionally limit the bandwidth. This will produce a spectral sequence that "knows" that Jan 1 and Dec 31+1 are the same thing. With a somewhat high bandwidth,
import matplotlib.pyplot as plt
import numpy as np
ave_return = np.array([
0.29549823, -0.04327911, -0.28475728, 0.24133149, 0.29175083,
0.05927994, -0.19481259, 0.0682162 , 0.12219757, 0.2537674 ,
0.24648395, 0.15455555, 0.27520195, -0.01664706, -0.47437987,
-0.01138717, -0.02216335, 0.0930811 , 0.61556973, 0.30738668,
0.30734683, 0.21362355, 0.13790445, -0.15041544, -0.37567391,
-0.06940527, -0.12529933, -0.26046757, -0.34338869, -0.3451905 ,
-0.02994229, -0.04620011, -0.03362213, 0.16813838, 0.20072505,
-0.22111894, -0.23910233, -0.29322923, -0.06443125, -0.07527673,
-0.25189341, -0.16183438, -0.07362219, -0.09708203, 0.00569532,
0.23257541, 0.07938912, 0.03610597, -0.23765742, -0.32248603,
0.04504569, -0.01805558, 0.03534886,
])
spectrum = np.fft.rfft(ave_return)
spectrum[30:] = 0
verified = np.fft.irfft(spectrum)
plt.scatter(np.arange(len(ave_return)), ave_return)
plt.plot(verified)
plt.show()
Lowering that bandwidth from 30 to something like 6 makes it more obvious that the periodic sequence starts and ends at the same place:
Upvotes: 1
Reputation: 261
I had similar problem when I was working with seasonal data, but I had daily records and had to find weekly seasonality.
My solution was to transform the data such that every year has 52 weeks. However, 52*7 is 364, so there will be 1 or 2 (if leap year) extra days. I considered these extra days as a part of last week.
I used this formula (I had daily data):
df["Week"] = (df["Date"].dt.dayofyear - 1) // 7
df["Week"] = df["Week"].clip(0, 51)
After that, I calculated seasonality as a simple average:
df.groupby("Week").agg({"ValueName": "mean"})
That was a simple solution and it worked okay.
Could you give a bit more information:
Upvotes: 0