Reputation: 1033
I have the following timeseries outputted by get_DP():
DP
date
1900-01-31 0.0357
1900-02-28 0.0362
1900-03-31 0.0371
1900-04-30 0.0379
... ...
2015-09-30 0.0219
[1389 rows x 1 columns]
note: There is a DP value for every month from 1900-2015, I simply excluded them to avoid clutter
I want to use a simple regression on this DataFrame to calculate the alpha & beta (intercept and coefficient resectively) of this financial variable. I have the following code that is intended to do so:
reg = linear_model.LinearRegression()
df = get_DP()
df=df.reset_index()
reg.fit(df['date'].values.reshape((1389,1)), df['DP'].values)
print("beta: {}".format(reg.coef_))
print("alpha: {}".format(reg.intercept_))
plt.scatter(df['date'].values.reshape((1389,1)), df['DP'].values, color='black')
plt.plot(df['date'].values.reshape((1389,1)), df['DP'].values, color='blue', linewidth=3)
However, I believe the reshaping of my x-axis data (the dates) messes up the entire regression, because the plot looks like so:
Am I making a mistake? I'm not entirely sure what the best tool is for regression w/ DataFrame's since pandas removed their OLS function with 0.20.
Upvotes: 0
Views: 96
Reputation: 1119
try this one
reg = linear_model.LinearRegression()
df = get_DP()
df=df.reset_index()
reg.fit(df.date.values.reshape(-1, 1), df.DP.values.reshape(-1, 1))
print("beta: {}".format(reg.coef_))
print("alpha: {}".format(reg.intercept_))
plt.scatter(df.date.dt.date, df.DP.values, color='black')
plt.plot(df.date.dt.date, df.DP.values, color='blue', linewidth=3)
Upvotes: 2