Troll_Hunter
Troll_Hunter

Reputation: 515

Does Sci-Kit learn's .fit(X,y) method work sequentially, if not how does it work?

I am using Sci-Kit learn's svm library for classifying images. I was wondering when I fit the testing data does it work sequentially or does it erase the previous classification material and re-fit to the new testing data. For example if I fit 100 images to the classifier can I go ahead and then sequentially fit another 100 images or will the SVM delete the work it performed on the original 100 images. This is difficult to explain for me so I'll provide and example:

In order to fit a SVM classifier to 200 images can I do this:

clf=SVC(kernel='linear')
clf.fit(test.data[0:100], test.target[0:100])
clf.fit(test.data[100:200], test.target[100:200])

Or must I do this:

clf=SVC(kernel='linear')
clf.fit(test.data[:200], test.target[:200])

I am wondering only because I run into memory errors when trying to use .fit(X, y) with too many images at once. So is it possible to use fit sequentially and "increment" my classifier upwards so that it is techincally trained on 10000 images but only 100 at a time.

If this is possible please confirm and explain? And if its not possible please explain?

Upvotes: 2

Views: 3257

Answers (1)

Ibraim Ganiev
Ibraim Ganiev

Reputation: 9390

http://scikit-learn.org/stable/developers/index.html#estimated-attributes

The last-mentioned attributes are expected to be overridden when you call fit a second time without taking any previous value into account: fit should be idempotent.

https://en.wikipedia.org/wiki/Idempotent

So yes, second call will erase old model and compute new one. You can check it by yourself if you understand python code. For example in sklearn/svm/classes.py

I think you need minibatch training, but i don't see partial_fit implementation for SVM, maybe it's because scikit-learn team recommend SGDClassifier and SGDRegressor for dataset with size more than 100k samples. http://scikit-learn.org/stable/tutorial/machine_learning_map/, try to use them with minibatch as described here.

Upvotes: 3

Related Questions