Reputation: 101
I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn
A sample row in my dataframe looks like this (2000 rows)
Open Close High Low Volume
0 537.40 537.10 541.55 530.47 52877.98
Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.
Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate
I was defining my features and targets as so
features = df.loc[:,df.columns != 'Closing']
targets = df.loc[:,df.columns == 'Closing']
Which would return a df looking like this features:
Open High Low Vol from
29 670.02 685.11 661.09 92227.36
targets:
Close
29 674.57
However I realised that the data needs to be in a numpy array, so I now get my features and targets like this
features = df.loc[:,df.columns != 'Closing'].values
targets = df.loc[:,df.columns == 'Closing'].values
So now my features look like this
[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04
6.23944806e+07]
[7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04
5.41391322e+08]
and my targets look like this
[ 674.57]
[ 8042.64]
I then split up my data using
X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)
I tried to follow the Scikit-Learn documentation, which resulted in the following
svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )
svr_rbf.fit(X_training, y_training)
predictions = svr_rbf.predict(X_testing)
print(predictions)
I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.
[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818
I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value
I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data
Upvotes: 1
Views: 1106
Reputation: 1928
You should normalize your features before using SVM for a classification task. SVMs are usually sensible to non-normalized features. Since your 5th feature is like 10,000 times greater than your 4 other features, it literally dominates your other features.
Have a look at this link which explains your issue very clearly: https://stats.stackexchange.com/questions/57010/is-it-essential-to-do-normalization-for-svm-and-random-forest
Upvotes: 1