Reputation: 17
I would like to predict the stock prices and news sentiment score together with SVM in R, in order to see whether news have an impact on stock price and their prediction. I read that support vector machines (svm) are a good machine learning approach for this problem. I have one column that represents the date of the stock and news, one column represents the stock prices on that day and 4 columns which represent the sentiment scores based on different lexica. I would like to test first with one of that lexica and if the models works, trying on the other. The dataset is included below. I found some examples with python but couldn't found something for R. I like to use the svm()
function from the e1071 package
I split the data into train and test set:
sample <- sample(nrow(sentGI),nrow(sentGI)*0.70)
df.trainGI = sentGI[sample,]
df.testGI = sentGI[-sample,]
And I tried already this SVM code, but my wrong prediction rate is 100
plot(df.trainGI$GSPC.Close, df.trainGI$SentimentGI, pch = 19, col = c("red", "blue"))
svm_model_GI <- svm(SentimentGI.Class ~ ., df.trainGI)
print(svm_model_GI)
plot(svm_model_GI, df.trainGI)
svm_pred_GI <- predict(svm_model_GI, newdata = df.testGI, type="response")
rmse <- sqrt(mean((svm_pred_GI - df.testGI$GSPC.Close)^2))
rmse
What I am doing wrong here? Hope somebody can help me!
Upvotes: 1
Views: 494
Reputation: 502
You're using model accuracy to evaluate the model. Accuracy is used for classification problems but your response variable is continuous. You should use RMSE.
pred <- predict(radial.svm, newdata=df.test, type='response')
rmse <- sqrt(mean((pred - df.test$GSPC.Close)^2))
rmse
Continuation from comments:
The first plots GSPC.Close against date (left) and the second plots SentimentGI against date (right). Notice that stock prices generally increase over time whereas sentiment has a slope of 0 in that same time frame. What does that tell you?
Upvotes: 1