Ben Williams
Ben Williams

Reputation: 101

SVR predicts same value for all features

I'm creating a basic application to predict the 'Closing' value of a stock for day n+1, given features of stock n using Python and Scikit-learn

A sample row in my dataframe looks like this (2000 rows)

       Open     Close    High     Low      Volume     
0      537.40   537.10   541.55   530.47   52877.98  

Similar to this video https://www.youtube.com/watch?v=SSu00IRRraY, where he uses 'Dates' and 'Open Price'. In this example, Dates are the features and Open price is the target.

Now in my example, I don't have a 'Dates' value in my dataset, but instead want to use Open, High, Low, Volume data as the features because I thought that would make it more accurate

I was defining my features and targets as so

features = df.loc[:,df.columns != 'Closing']
targets = df.loc[:,df.columns  == 'Closing']

Which would return a df looking like this features:

       Open      High      Low      Vol from  
29     670.02    685.11    661.09   92227.36

targets:

       Close
29     674.57

However I realised that the data needs to be in a numpy array, so I now get my features and targets like this

features = df.loc[:,df.columns != 'Closing'].values
targets = df.loc[:,df.columns  == 'Closing'].values

So now my features look like this

[6.70020000e+02 6.85110000e+02 6.61090000e+02 9.22273600e+04
  6.23944806e+07]
 [7.78102000e+03 8.10087000e+03 7.67541000e+03 6.86188500e+04
  5.41391322e+08]

and my targets look like this

[  674.57]
[ 8042.64]

I then split up my data using

X_training, X_testing, y_training, y_testing = train_test_split(features, targets, test_size=0.8)

I tried to follow the Scikit-Learn documentation, which resulted in the following

svr_rbf = svm.SVR(kernel='rbf', C=100.0, gamma=0.0004, epsilon= 0.01 )
svr_rbf.fit(X_training, y_training)
predictions = svr_rbf.predict(X_testing)
print(predictions)

I assumed that this would predict the Y values given the testing features, which I could then plot against the actual y_testing values to see how similar they are. However, the predictions is printing out the same value for each X_testing feature.

[3763.84681818 3763.84681818 3763.84681818 3763.84681818 3763.84681818

I've tried changing the value of epsilon, c and gamma but that doesnt seem to change the fact that the predictions always gives the same value

I know that it might not be accurate to predict stock prices, but I must have done something wrong to get the same value when applying the model to various different test data

Upvotes: 1

Views: 1106

Answers (1)

alift
alift

Reputation: 1928

You should normalize your features before using SVM for a classification task. SVMs are usually sensible to non-normalized features. Since your 5th feature is like 10,000 times greater than your 4 other features, it literally dominates your other features.

Have a look at this link which explains your issue very clearly: https://stats.stackexchange.com/questions/57010/is-it-essential-to-do-normalization-for-svm-and-random-forest

Upvotes: 1

Related Questions