Reputation: 1312
I have the following pipeline:
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
("kmeans", KMeans(n_clusters=50)),
("log_reg", LogisticRegression()),
])
pipeline.fit(X_train, y_train)
And I want to access the kmeans
's labels (or any other metric of Kmeans
). I don't how. I tried print(kmeans.labels_)
or even print(pipeline.labels_)
, but this doesn't work, and I get error that the variables are undefined. How can I access results of a particular stage in pipeline
?
Upvotes: 1
Views: 1728
Reputation: 16966
With latest version (0.21.2) of sklearn, you could use __getitem__
of pipeline to index steps.
from sklearn.datasets import samples_generator
from sklearn.cluster import KMeans
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
# generate some data to play with
X, y = samples_generator.make_classification(
n_informative=5, n_redundant=0, random_state=42)
pipeline = Pipeline([
("kmeans", KMeans(n_clusters=50)),
("log_reg", LogisticRegression(solver='lbfgs')),
])
pipeline.fit(X, y)
pipeline['kmeans'].labels_
# array([ 2, 42, 40, 38, ...])
for previous versions, use pipeline.named_steps['kmeans']
Upvotes: 1