sweeeeeet
sweeeeeet

Reputation: 1819

fit method in sklearn

I am asking myself various questions about the fit method in sklearn.

Question 1: when I do:

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = model.fit(X2)

Is the content of the variable model changing whatsoever during the process?

Question 2: when I do:

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = svd_1.fit(X2)

What is happening to svd_1? In other words, svd_1 has already been fitted and I fit it again, so what is happenning to its component?

Upvotes: 8

Views: 33687

Answers (2)

MB-F
MB-F

Reputation: 23647

Question 1: Is the content of the variable model changing whatsoever during the process?

Yes. The fit method modifies the object. And it returns a reference to the object. Thus, take care! In the first example all three variables model, svd_1, and svd_2 actually refer to the same object.

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = model.fit(X2)
print(model is svd_1 is svd_2)  # prints True

Question 2: What is happening to svd_1?

model and svd_1 refer to the same object, so there is absolutely no difference between the first and the second example.

Final Remark: What happens in both examples is that the result of fit(X1) is overwritten by fit(X2), as pointed out in the answer by David Maust. If you want to have two different models fitted to two different sets of data you need to do something like this:

svd_1 = TruncatedSVD().fit(X1)
svd_2 = TruncatedSVD().fit(X2)

Upvotes: 12

David Maust
David Maust

Reputation: 8270

When you call fit on TruncatedSVD. It will replace the components with those built from the new matrix. Some estimators and transformers in scikit-learn like IncrementalPCA have a partial_fit which will incrementally build a model by adding additional data.

Upvotes: 5

Related Questions