Reputation: 52
I'm trying to run the following code but I'm getting a 'Pipeline' object is not subscriptable' error when I do pipe['count'].
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
import numpy as np
corpus = ['this is the first document',
'this document is the second document',
'and this is the third one',
'is this the first document']
vocabulary = ['this', 'document', 'first', 'is', 'second', 'the',
'and', 'one']
pipe = Pipeline([('count', CountVectorizer(vocabulary=vocabulary)),
('tfid', TfidfTransformer())]).fit(corpus)
pipe['count'].transform(corpus).toarray()
array([[1, 1, 1, 1, 0, 1, 0, 0],
[1, 2, 0, 1, 1, 1, 0, 0],
[1, 0, 0, 1, 0, 1, 1, 1],
[1, 1, 1, 1, 0, 1, 0, 0]])
pipe['tfid'].idf_
array([1. , 1.22314355, 1.51082562, 1. , 1.91629073,
1. , 1.91629073, 1.91629073])
pipe.transform(corpus).shape
(4, 8)```
Upvotes: 3
Views: 3914
Reputation: 186
Instead of pipe['count']
, you can try pipe.named_steps['count']
. To access your 'tfidf'
step, try pipe.named_steps['tfid']
.
Upvotes: 5