Why should I subclass BaseEstimator in a Scikit-Learn Pipeline?

Question

In the scikit-learn documentation they give examples of custom Transformers which subclass both the BaseEstimator and TransformerMixin classes. I'm wondering, why is the BaseEstimator subclass used in these examples?

To try and answer this question, I took it out of the ItemSelector class, but Python did not complain.

Harpal · Accepted Answer

BaseEstimator provides an implementation of the get_params and set_params methods. Why is this needed? It can be used to make a model applicable to GirdSearchCV. This ensures it behaves well when placed in a pipeline. This is just one of the applications for the BaseEstimator

In the example you provided, no grid search is performed, which is why it was not needed. It is included in most places (I believe) for best practice, essentialy future proofing your code for the insertion of a grid search into the pipeline.

Why should I subclass BaseEstimator in a Scikit-Learn Pipeline?

Answers (2)

Related Questions