AttributeError: vocabulary not found

Question

I'm using movie_reviews data and using countvectorizer in it. I want to change it in dictionary for showing the unique words in index as you see the code below:

from sklearn.feature_extraction.text import CountVectorizer
import nltk
cv = CountVectorizer(tokenizer=nltk.word_tokenize , stop_words='english')
movie_train_cv = cv.fit_transform(movie_train.data)

movie_train_cv.vocabulary_

AttributeError: vocabulary not found. At last line ,I get the error. Please let me know what is the correct syntax.

I want like that.

sents = ['A rose is a rose is a rose is a rose.',
     'Oh, what a fine day it is.',
    "It ain't over till it's over, I tell you!!"]

  #sents turned into sparse vector of word frequency counts
  sents_counts = foovec.fit_transform(sents)
  #foovec now contains vocab dictionary which maps unique words to indexes
  foovec.vocabulary_

here is the output of this code: {'a': 4, 'rose': 14, 'is': 9, '.': 3, 'oh': 12, ',': 2, 'what': 17, 'fine': 7, 'day': 6, 'it': 10, 'ai': 5, "n't": 11, 'over': 13, 'till': 16, "'s": 1, 'i': 8, 'tell': 15, 'you': 18, '!': 0}

Owen · Accepted Answer

Calling fit_transform on a CountVectorizer returns an array, as discussed in the documentation.

The vocabulary_ attribute is on the CountVectorizer itself. The returned array does not have a vocabulary_ attribute.

To access the vocabulary of the CountVectorizer after you've created it, simply do the following:

vocab = cv.vocabulary_

AttributeError: vocabulary not found

Answers (1)

Related Questions