Reputation: 851
I am trying to train svm.sparse.SVC in scikit-learn. Right now the dimension of the feature vectors is around 0.7 million and the number of feature vectors being used for training is 20k. I am providing input using csr sparse matrices as only around 500 dimensions are non-zero in each feature vector. The code is running since the past 5 hours. Is there any estimate on how much time it will take? Is there any way to do the training faster? Kernel is linear.
Upvotes: 2
Views: 1990
Reputation: 16049
Try using sklearn.svm.LinearSVC
. This also has a linear kernel, but the underlying implementation is liblinear
, which is known to be faster. With that in mind, your data set isn't very small, so even this classifier might take a while.
Edit after first comment: In that I think you have several options, neither of which is perfect:
svm.sparse.SVC
has finished tomorrow morning. If you can, buy a better computer.sklearn.naive_bayes.*
, sklearn.linear_model.LogisticRegression
. etc. These will be much faster to train, but the price you pay is somewhat reduced accuracy.Upvotes: 3