Reputation: 11
I have been looking for fast linear SVM library and i came across two of the most important ones Liblinear and Pegasos , from the paper presented of liblinear it looks like for liblinaer outperforms pegasos. however pegasos claims if data is sparse then it works fast. As pegasos came earlier there is no comparison in it;s documentation.
So for sparse data what should i choose ?
Upvotes: 1
Views: 576
Reputation: 28768
As far as I know, sparse data is handled fine by both. The question is more on the number of data points. Liblinear has solvers for both the primal and the dual, and these solve the problem to a high precision without any need to tune the parameters. For pegasos or similar subgradient descent solvers (if you want one of these, I'd recommend Leon Bottou's sgd) the result is strongly dependant on the initial learning rate and learning rate schedule, which can be tricky to tune.
As a rule of thumb, if I have less than 10k data points, I'd always use liblinear (with the primal solver), maybe even up to 100k. Above that, I'd consider using SGD if I feel liblinear is to slow. Even if liblinear is slightly slower, I prefer using it as it means I don't have to think about learning rate, learning rate decay and number of epochs.
Btw, you can very easily compare these different solvers using a framework like scikit-learn, which includes SGD, Liblinear and LibSVM solvers, or lightning, which includes A LOT of solvers.
Upvotes: 3
Reputation: 143
Both LIBLINEAR and Pegasos are linear classification techniques that were specifically developed to deal with large sparse data with a huge number of instances and features. They are only faster than the traditional SVM on this kind of data.
I never used Pegasos before, but I can assure you that LIBLINEAR is very fast with this kind of data and the authors say that "it is competitive with or even faster than state of the art linear classifiers such as Pegasos".
Upvotes: 1