Prediction computational complexity of an SVM classifier

Question

What is the prediction complexity for linear SVM? It's separating surface is just a hyperplane, so it seems that prediction time shouldn't depend on the training data. At the same time I've read that the complexity is proportional to the number of support vectors. What's the point in keeping all those support vectors in the trained classifier?

Marc Claesen · Accepted Answer

For linear SVM the separating hyperplane can indeed be computed explicitly and saved as the model. In other words, prediction with a linear SVM model strictly requires the hyperplane in input space. Many specialized linear packages do exactly this (LIBLINEAR, SVMPERF, PEGASOS, ...). Even though you don't need knowledge of support vectors to predict with a linear SVM, knowing the support vectors may still have all kinds of uses.

When predicting with SVMs using nonlinear kernels, the story is quite different since the separating hyperplane may be infinite dimensional (for example when using the RBF kernel). Computing the hyperplane itself in feature space may not even be possible, but inner products between the hyperplane and test points in feature space can be computed via kernel evaluations between support vectors and the test points. This is the so-called kernel trick.

Most general packages which support both linear and nonlinear kernels tend to save all models in the same way (such as LIBSVM). This means that linear models are stored and evaluated in terms of inner products between test points and support vectors just like a nonlinear model would. Clearly, this is more complex than it should be for the linear SVM. The vast majority of people use general packages for SVMs instead of specialized linear ones, even when training linear SVMs. This is probably the reason why many people erroneously assume that predictions with linear SVM always rely on support vectors.

Prediction computational complexity of an SVM classifier

Answers (2)

Related Questions