Reputation: 851
I need to train the svm classifier in sklearn. The dimensions of the feature vectors go in hundreds of thousands and there are tens of thousands of such feature vectors. However, each dimension can be 0, 1 or -1. Only some 100 are non-zero in each feature vector. Any efficient way to give the info about the feature vectors to the classifier?
Upvotes: 1
Views: 2681
Reputation: 40169
I need to train the svm classifier in sklearn.
You mean sklearn.svm.SVC
? For high dimensional sparse data and many samples, LinearSVC
, LogisticRegression
, PassiveAggressiveClassifier
or SGDClassifier
can be much faster to train for comparable predictive accuracy.
The dimensions of the feature vectors go in lakhs and there are tens of thousands of such feature vectors. However, each dimension can be 0, 1 or -1. Only some 100 are non-zero in each feature vector. Any efficient way to give the info about the feature vectors to the classifier?
Find a way to load your data as a scipy.sparse
matrix that does not store the zeros in memory. Have a look at the documentation on feature extraction. It will give you tools to do that depending on the nature of the representation of the original data.
Upvotes: 2