Reputation: 11
I'm trying to predict stock prices using sklearn. I'm new to prediction. I tried the example from sklearn for stock prediction with gaussian hmm. But predict gives states sequence which overlay on the price and it also takes points from given input close price. My question is how to generate next 10 prices?
Upvotes: 1
Views: 3041
Reputation: 176
You will always use the last state to predict the next state, so let's add 10 days worth of inputs by changing the end date to the 23rd:
date2 = datetime.date(2012, 1, 23)
You can double check the rest of the code to make sure I am not actually using future data for the prediction. The rest of these lines can be added to the bottom of the file. First we want to find out what the expected return is for a given state. The model.means_ array has returns, both those were the returns that got us to this state, not the future returns which is what you want. To get the future returns, we consider the probability of going to any one of the 5 states, and what the return of those states is. We get the probability of going to any particular state from the model.transmat_ matrix, the for the return of each state we use the model.means_ values. We take the dot product to get the expected return for a particular state. Then we remove the volume data (you can leave it in if you want, but you seemed to be most interested in future prices).
expected_returns_and_volumes = np.dot(model.transmat_, model.means_)
returns_and_volumes_columnwise = zip(*expected_returns_and_volumes)
returns = returns_and_volumes_columnwise[0]
If you print the value for returns[0], you'll see the expected return in dollars for state 0, returns[1] for state 1 etc. Now, given a day and a state, we want to predict the price for tomorrow. You said 10 days so let's use that for lastN.
predicted_prices = []
lastN = 10
for idx in xrange(lastN):
state = hidden_states[-lastN+idx]
current_price = quotes[-lastN+idx][2]
current_date = datetime.date.fromordinal(dates[-lastN+idx])
predicted_date = current_date + datetime.timedelta(days=1)
predicted_prices.append((predicted_date, current_price + returns[state]))
print(predicted_prices)
If you were running this in "production" you would set date2 to the last date you have and then lastN would be 1. Note that I don't take into account weekends for the predicted_date.
This is a fun exercise but you probably wouldn't run this in production, hence the quotes. First, the time series is the raw price; this should really be percentage returns or log returns. Plus there is no justification for picking 5 states for the HMM, or that a HMM is even good for this kinda problem, which I doubt. They probably just picked it as an example. I think the other sklearn example using PCA is much more interesting.
Upvotes: 1