Reputation: 335
How can you plot the linear regression results from scikit learn
after the analysis to see the "testing" data (real values vs. predicted values) at the end of the program? The code below is close but I believe it is missing a scaling factor.
input:
import pandas as pd
import numpy as np
import datetime
pd.core.common.is_list_like = pd.api.types.is_list_like # temp fix
import fix_yahoo_finance as yf
from pandas_datareader import data, wb
from datetime import date
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing, cross_validation, svm
import matplotlib.pyplot as plt
df = yf.download('MMM', start = date (2012, 1, 1), end = date (2018, 1, 1) , progress = False)
df_low = df[['Low']] # create a new df with only the low column
forecast_out = int(5) # predicting some days into future
df_low['low_prediction'] = df_low[['Low']].shift(-forecast_out) # create a new column based on the existing col but shifted some days
X_low = np.array(df_low.drop(['low_prediction'], 1))
X_low = preprocessing.scale(X_low) # scaling the input values
X_low_forecast = X_low[-forecast_out:] # set X_forecast equal to last 5 days
X_low = X_low[:-forecast_out] # remove last 5 days from X
y_low = np.array(df_low['low_prediction'])
y_low = y_low[:-forecast_out]
X_low_train, X_low_test, y_low_train, y_low_test = cross_validation.train_test_split(X_low, y_low, test_size = 0.2)
clf_low = LinearRegression() # classifier
clf_low.fit(X_low_train, y_low_train) # training
confidence_low = clf_low.score(X_low_test, y_low_test) # testing
print("confidence for lows: ", confidence_low)
forecast_prediction_low = clf_low.predict(X_low_forecast)
print(forecast_prediction_low)
plt.figure(figsize = (17,9))
plt.grid(True)
plt.plot(X_low_test, color = "red")
plt.plot(y_low_test, color = "green")
plt.show()
image:
Upvotes: 2
Views: 1335
Reputation: 173
You plot y_test
and X_test
, while you should plot y_test
and clf_low.predict(X_test)
instead, if you want to compare target and predicted.
BTW, clf_low
in your code is not a classifier, it is a regressor. It's better to use the alias model
instead of clf
.
Upvotes: 2