Reputation: 8116
I've been searching google and can't figure out what I'm doing wrong. I'm pretty new to python and trying to use scikit on stocks but I'm getting the error "ValueError: matrices are not aligned" when trying to predict.
import datetime
import numpy as np
import pylab as pl
from matplotlib import finance
from matplotlib.collections import LineCollection
from sklearn import cluster, covariance, manifold, linear_model
from sklearn import datasets, linear_model
###############################################################################
# Retrieve the data from Internet
# Choose a time period reasonnably calm (not too long ago so that we get
# high-tech firms, and before the 2008 crash)
d1 = datetime.datetime(2003, 01, 01)
d2 = datetime.datetime(2008, 01, 01)
# kraft symbol has now changed from KFT to MDLZ in yahoo
symbol_dict = {
'AMZN': 'Amazon'}
symbols, names = np.array(symbol_dict.items()).T
quotes = [finance.quotes_historical_yahoo(symbol, d1, d2, asobject=True)
for symbol in symbols]
open = np.array([q.open for q in quotes]).astype(np.float)
close = np.array([q.close for q in quotes]).astype(np.float)
# The daily variations of the quotes are what carry most information
variation = close - open
#########
pl.plot(range(0, len(close[0])-20), close[0][:-20], color='black')
model = linear_model.LinearRegression(normalize=True)
model.fit([close[0][:-1]], [close[0][1:]])
print(close[0][-20:])
model.predict(close[0][-20:])
#pl.plot(range(0, 20), model.predict(close[0][-20:]), color='red')
pl.show()
The error line is
model.predict(close[0][-20:])
I've tried nesting it in a list. Making it an array with numpy. Anything I could find on google but I have no idea what I'm doing here.
What does this error mean and why is it happening?
Upvotes: 0
Views: 2964
Reputation: 54400
Trying to predict stock price by simple linear regression? :^|. Anyway, this is what you need to change:
In [19]:
M=model.fit(close[0][:-1].reshape(-1,1), close[0][1:].reshape(-1,1))
In [31]:
M.predict(close[0][-20:].reshape(-1,1))
Out[31]:
array([[ 90.92224274],
[ 94.41875811],
[ 93.19997275],
[ 94.21895723],
[ 94.31885767],
[ 93.030142 ],
[ 90.76240203],
[ 91.29187436],
[ 92.41075928],
[ 89.0940647 ],
[ 85.10803717],
[ 86.90624508],
[ 89.39376602],
[ 90.59257129],
[ 91.27189427],
[ 91.02214318],
[ 92.86031126],
[ 94.25891741],
[ 94.45871828],
[ 92.65052033]])
Remember, when you build a model, X
and y
for .fit
method should have the shape of [n_samples,n_features]
. The same applies to the .predict
method.
Upvotes: 2