Évariste Galois
Évariste Galois

Reputation: 1033

using sklearn linear regression fit on timeseries + plotting

I have the following timeseries outputted by get_DP():

                DP
date              
1900-01-31  0.0357
1900-02-28  0.0362
1900-03-31  0.0371
1900-04-30  0.0379
...            ...
2015-09-30  0.0219

[1389 rows x 1 columns]

note: There is a DP value for every month from 1900-2015, I simply excluded them to avoid clutter

I want to use a simple regression on this DataFrame to calculate the alpha & beta (intercept and coefficient resectively) of this financial variable. I have the following code that is intended to do so:

reg = linear_model.LinearRegression()
df = get_DP()
df=df.reset_index()
reg.fit(df['date'].values.reshape((1389,1)), df['DP'].values)
print("beta: {}".format(reg.coef_))
print("alpha: {}".format(reg.intercept_))
plt.scatter(df['date'].values.reshape((1389,1)), df['DP'].values,  color='black')
plt.plot(df['date'].values.reshape((1389,1)), df['DP'].values, color='blue', linewidth=3)

However, I believe the reshaping of my x-axis data (the dates) messes up the entire regression, because the plot looks like so: plot

Am I making a mistake? I'm not entirely sure what the best tool is for regression w/ DataFrame's since pandas removed their OLS function with 0.20.

Upvotes: 0

Views: 96

Answers (1)

ilia timofeev
ilia timofeev

Reputation: 1119

try this one

reg = linear_model.LinearRegression()
df = get_DP()
df=df.reset_index()
reg.fit(df.date.values.reshape(-1, 1), df.DP.values.reshape(-1, 1))
print("beta: {}".format(reg.coef_))
print("alpha: {}".format(reg.intercept_))
plt.scatter(df.date.dt.date, df.DP.values,  color='black')
plt.plot(df.date.dt.date, df.DP.values, color='blue', linewidth=3)

See reshape documentation

Upvotes: 2

Related Questions