Naresh MG
Naresh MG

Reputation: 723

python CountVectorizer() vocabulary_ get method returns None

I have this piece of code as per documentation at http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

from sklearn.datasets import load_files
from sklearn.feature_extraction.text import CountVectorizer

count_vect = CountVectorizer()

my_bunch = load_files("c:\\temp\\billing_test\\")

my_data = my_bunch['data']
print (my_bunch.keys())
print('target_names',my_bunch['target_names'])
print('length of data' , len(my_bunch['data']))


X_train_counts = count_vect.fit_transform(my_data)
print(X_train_counts.shape)

print ( count_vect.vocabulary_.get(u'algorithm'))

the output is as follows

dict_keys(['target', 'filenames', 'target_names', 'data', 'DESCR'])
target_names ['false', 'true']
length of data 920
(920, 8773)
None

Wonder why the "None" towards the bottom after (920, 8773)

I have around 460 text documents in each of the folder "true" and "false"

thanks,

Upvotes: 1

Views: 2382

Answers (1)

Farseer
Farseer

Reputation: 4172

Because word 'algoritham' never appeared in your documents.

Perhaps you should try 'algorithm'.

Upvotes: 5

Related Questions