user184074
user184074

Reputation: 93

Why should I subclass BaseEstimator in a Scikit-Learn Pipeline?

In the scikit-learn documentation they give examples of custom Transformers which subclass both the BaseEstimator and TransformerMixin classes. I'm wondering, why is the BaseEstimator subclass used in these examples?

To try and answer this question, I took it out of the ItemSelector class, but Python did not complain.

Upvotes: 2

Views: 1528

Answers (2)

Sadak
Sadak

Reputation: 911

BaseEstimator provides among other things a default implementation for the get_params and set_params methods, see [the source code]. This is useful to make the model grid search-able with GridSearchCV for automated parameters tuning and behave well with others when combined in a Pipeline.

Upvotes: 2

Harpal
Harpal

Reputation: 12587

BaseEstimator provides an implementation of the get_params and set_params methods. Why is this needed? It can be used to make a model applicable to GirdSearchCV. This ensures it behaves well when placed in a pipeline. This is just one of the applications for the BaseEstimator

In the example you provided, no grid search is performed, which is why it was not needed. It is included in most places (I believe) for best practice, essentialy future proofing your code for the insertion of a grid search into the pipeline.

Upvotes: 6

Related Questions