Tom Marek
Tom Marek

Reputation: 378

Time series prediction using support vector regression

I've been trying to implement time series prediction tool using support vector regression in python language. I use SVR module from scikit-learn for non-linear Support vector regression. But I have serious problem with prediction of future events. The regression line fits the original function great (from known data) but as soon as I want to predict future steps, it returns value from the last known step.

My code looks like this:

import numpy as np
from matplotlib import pyplot as plt
from sklearn.svm import SVR

X = np.arange(0,100)
Y = np.sin(X)

svr_rbf = SVR(kernel='rbf', C=1e5, gamma=1e5)
y_rbf = svr_rbf.fit(X[:-10, np.newaxis], Y[:-10]).predict(X[:, np.newaxis])

figure = plt.figure()
tick_plot = figure.add_subplot(1, 1, 1)
tick_plot.plot(X, Y, label='data', color='green', linestyle='-')
tick_plot.axvline(x=X[-10], alpha=0.2, color='gray')
tick_plot.plot(X, y_rbf, label='data', color='blue', linestyle='--')
plt.show()

Any ideas?
thanks in advance, Tom

Upvotes: 8

Views: 7730

Answers (1)

user1149913
user1149913

Reputation: 4533

You are not really doing time-series prediction. You are trying to predict each element of Y from a single element of X, which means that you are just solving a standard kernelized regression problem.

Another problem is when computing the RBF kernel over a range of vectors [[0],[1],[2],...], you will get a band of positive values along the diagonal of the kernel matrix while values far from the diagonal will be close to zero. The test set portion of your kernel matrix is far from the diagonal and will therefore be very close to zero, which would cause all of the SVR predictions to be close to the bias term.

For time series prediction I suggest building the training test set as

 x[0]=Y[0:K]; y[0]=Y[K]
 x[1]=Y[1:K+1]; y[1]=Y[K+1]
 ...

that is, try to predict future elements of the sequence from a window of previous elements.

Upvotes: 10

Related Questions