lizarisk
lizarisk

Reputation: 7820

Prediction computational complexity of an SVM classifier

What is the prediction complexity for linear SVM? It's separating surface is just a hyperplane, so it seems that prediction time shouldn't depend on the training data. At the same time I've read that the complexity is proportional to the number of support vectors. What's the point in keeping all those support vectors in the trained classifier?

Upvotes: 3

Views: 2864

Answers (2)

Marc Claesen
Marc Claesen

Reputation: 17026

For linear SVM the separating hyperplane can indeed be computed explicitly and saved as the model. In other words, prediction with a linear SVM model strictly requires the hyperplane in input space. Many specialized linear packages do exactly this (LIBLINEAR, SVMPERF, PEGASOS, ...). Even though you don't need knowledge of support vectors to predict with a linear SVM, knowing the support vectors may still have all kinds of uses.

When predicting with SVMs using nonlinear kernels, the story is quite different since the separating hyperplane may be infinite dimensional (for example when using the RBF kernel). Computing the hyperplane itself in feature space may not even be possible, but inner products between the hyperplane and test points in feature space can be computed via kernel evaluations between support vectors and the test points. This is the so-called kernel trick.

Most general packages which support both linear and nonlinear kernels tend to save all models in the same way (such as LIBSVM). This means that linear models are stored and evaluated in terms of inner products between test points and support vectors just like a nonlinear model would. Clearly, this is more complex than it should be for the linear SVM. The vast majority of people use general packages for SVMs instead of specialized linear ones, even when training linear SVMs. This is probably the reason why many people erroneously assume that predictions with linear SVM always rely on support vectors.

Upvotes: 7

Davis King
Davis King

Reputation: 4791

You are right, the prediction time does not depend on the data for a linear SVM. This is because the predictor is just a dot product between a test vector and the learned weight vector.

There is no point in keeping the support vectors around, anyone who says otherwise is confused :). If for some reason you wanted to know what the support vectors were later on you can easily find out by evaluating the classifier on the training data. All the training data samples that get misspredicted or have an output value from the SVM less than 1 in absolute value are the support vectors.

Upvotes: 5

Related Questions