Reputation: 37
I'm using movie_reviews data and using countvectorizer in it. I want to change it in dictionary for showing the unique words in index as you see the code below:
from sklearn.feature_extraction.text import CountVectorizer
import nltk
cv = CountVectorizer(tokenizer=nltk.word_tokenize , stop_words='english')
movie_train_cv = cv.fit_transform(movie_train.data)
movie_train_cv.vocabulary_
AttributeError: vocabulary not found. At last line ,I get the error. Please let me know what is the correct syntax.
I want like that.
sents = ['A rose is a rose is a rose is a rose.',
'Oh, what a fine day it is.',
"It ain't over till it's over, I tell you!!"]
#sents turned into sparse vector of word frequency counts
sents_counts = foovec.fit_transform(sents)
#foovec now contains vocab dictionary which maps unique words to indexes
foovec.vocabulary_
here is the output of this code: {'a': 4, 'rose': 14, 'is': 9, '.': 3, 'oh': 12, ',': 2, 'what': 17, 'fine': 7, 'day': 6, 'it': 10, 'ai': 5, "n't": 11, 'over': 13, 'till': 16, "'s": 1, 'i': 8, 'tell': 15, 'you': 18, '!': 0}
Upvotes: 2
Views: 2016
Reputation: 919
Calling fit_transform
on a CountVectorizer
returns an array, as discussed in the documentation.
The vocabulary_
attribute is on the CountVectorizer
itself. The returned array does not have a vocabulary_
attribute.
To access the vocabulary of the CountVectorizer
after you've created it, simply do the following:
vocab = cv.vocabulary_
Upvotes: 4