Accessing transformer functions in `sklearn` pipelines

Question

According to sklearn.pipeline.Pipeline documentation,

The pipeline has all the methods that the last estimator in the pipeline has, i.e. if the last estimator is a classifier, the Pipeline can be used as a classifier. If the last estimator is a transformer, again, so is the pipeline.

The following example creates a dummy transformer with a custom, dummy function f:

class C:
    def fit(self, X, y=None):
        print('fit')
        return self
    def transform(self, X):
        print('transform')
        return X

    def f(self):
        print('abc')

from sklearn.pipeline import Pipeline
ppl = Pipeline([('C', C())])

I was expecting to be able to access the f function of the C transformer, however calling ppl.f() results in AttributeError: 'Pipeline' object has no attribute 'f'

Am I misinterpreting the documentation? Is there a good and reliable way to access the last transformer's functions?

Andreus · Accepted Answer

The Pipeline documentation slightly overstates things. It has all the estimator methods of its last estimator. These include things like predict(), fit_predict(), fit_transform(), transform(), decision_function(), predict_proba()....

It cannot use any other functions, because it wouldn't know what to do with all the other steps in the pipeline. For most situations, you pass (X) or possibly (X,y), and X and/or y must pass through every chain in the pipeline either with fit_transform() or transform().

It is fairly easy to access the last estimator, like this:

ppl.steps[-1][1].f()

But remember that doing so is bypassing the previous steps in the pipeline (i.e., if you pass it X, it won't be scaled with your StandardScaler or whatever you are doing earlier in the pipeline.)

Accessing transformer functions in `sklearn` pipelines

Answers (1)

Related Questions