Yagel
Yagel

Reputation: 1312

Scikit's Pipeline - how to access the results of a particular stage

I have the following pipeline:

from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ("kmeans", KMeans(n_clusters=50)),
    ("log_reg", LogisticRegression()),
])
pipeline.fit(X_train, y_train)

And I want to access the kmeans's labels (or any other metric of Kmeans). I don't how. I tried print(kmeans.labels_) or even print(pipeline.labels_), but this doesn't work, and I get error that the variables are undefined. How can I access results of a particular stage in pipeline?

Upvotes: 1

Views: 1728

Answers (1)

Venkatachalam
Venkatachalam

Reputation: 16966

With latest version (0.21.2) of sklearn, you could use __getitem__ of pipeline to index steps.

from sklearn.datasets import samples_generator
from sklearn.cluster import KMeans
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
# generate some data to play with
X, y = samples_generator.make_classification(
    n_informative=5, n_redundant=0, random_state=42)

pipeline = Pipeline([
    ("kmeans", KMeans(n_clusters=50)),
    ("log_reg", LogisticRegression(solver='lbfgs')),
])
pipeline.fit(X, y)
pipeline['kmeans'].labels_

# array([ 2, 42, 40, 38, ...])

for previous versions, use pipeline.named_steps['kmeans']

Upvotes: 1

Related Questions