Reputation: 31757
According to sklearn.pipeline.Pipeline documentation,
The pipeline has all the methods that the last estimator in the pipeline has, i.e. if the last estimator is a classifier, the Pipeline can be used as a classifier. If the last estimator is a transformer, again, so is the pipeline.
The following example creates a dummy transformer with a custom, dummy function f
:
class C:
def fit(self, X, y=None):
print('fit')
return self
def transform(self, X):
print('transform')
return X
def f(self):
print('abc')
from sklearn.pipeline import Pipeline
ppl = Pipeline([('C', C())])
I was expecting to be able to access the f
function of the C
transformer, however calling ppl.f()
results in AttributeError: 'Pipeline' object has no attribute 'f'
Am I misinterpreting the documentation? Is there a good and reliable way to access the last transformer's functions?
Upvotes: 6
Views: 2028
Reputation: 2487
The Pipeline
documentation slightly overstates things. It has all the estimator methods of its last estimator. These include things like predict(), fit_predict(), fit_transform(), transform(), decision_function(), predict_proba()...
.
It cannot use any other functions, because it wouldn't know what to do with all the other steps in the pipeline. For most situations, you pass (X)
or possibly (X,y)
, and X and/or y must pass through every chain in the pipeline either with fit_transform()
or transform()
.
It is fairly easy to access the last estimator, like this:
ppl.steps[-1][1].f()
But remember that doing so is bypassing the previous steps in the pipeline (i.e., if you pass it X
, it won't be scaled with your StandardScaler or whatever you are doing earlier in the pipeline.)
Upvotes: 5