svm.sparse.SVC taking a lot of time to get trained

Question

I am trying to train svm.sparse.SVC in scikit-learn. Right now the dimension of the feature vectors is around 0.7 million and the number of feature vectors being used for training is 20k. I am providing input using csr sparse matrices as only around 500 dimensions are non-zero in each feature vector. The code is running since the past 5 hours. Is there any estimate on how much time it will take? Is there any way to do the training faster? Kernel is linear.

mbatchkarov · Accepted Answer

Try using sklearn.svm.LinearSVC. This also has a linear kernel, but the underlying implementation is liblinear, which is known to be faster. With that in mind, your data set isn't very small, so even this classifier might take a while.

Edit after first comment: In that I think you have several options, neither of which is perfect:

The non-solution option: call it a day and hope that training of svm.sparse.SVC has finished tomorrow morning. If you can, buy a better computer.
The cheat option: give up on probabilities. You haven't told us what your problem is, so they may not be essential.
The back-against-the-wall option: if you absolutely need probabilities and things must run faster, use a different classifier. Options include sklearn.naive_bayes.*, sklearn.linear_model.LogisticRegression. etc. These will be much faster to train, but the price you pay is somewhat reduced accuracy.

svm.sparse.SVC taking a lot of time to get trained

Answers (1)

Related Questions