Reputation: 6615
I am using scikit library for using svm. I have huge amount of data which I can't read together to give fit()
function.
I want to give iterate over all my data which is in a file and train svm one by one. Is there any way to do this. It is not clear form the documentation and in their tutorial they are giving complete data to fit
at once.
Is there any way to train it one by one (means may be something like calling fit
for every input pattern of the training data).
Upvotes: 6
Views: 2294
Reputation: 40149
Support Vector Machine (at least as implemented in libsvm which scikit-learn is a wrapper of) is fundamentally a batch algorithm: it needs to have access to all the data in memory at once. Hence they are not scalable.
Instead you should use models that support incremental learning with the partial_fit
method. For instance some linear models such as sklearn.linear_model.SGDClassifier
support the partial_fit
method. You can slice your dataset and load it as a sequence of minibatches with shape (batch_size, n_features)
. batch_size
can be 1 but is not efficient because the of the python interpreter overhead (+ the data load overhead). So it is recommended to lead samples by minitaches of a least 100.
Upvotes: 15