Far
Far

Reputation: 203

How to get words's id in the vocabulary of bag-of-words given the word?

I have exploit Bag-of-words model on a bunch of messages as following :

    bow_transformer = CountVectorizer(analyzer=split_into_lemmas).fit(messages['message'])
    B4 = bow_transformer.transform([msg4])
    print B4
    print bow_transformer.get_feature_names()[6736]
    print bow_transformer.get_feature_names()[8013]

(0, 1158) 1
(0, 1899) 1
(0, 2897) 1
(0, 2927) 1
(0, 4021) 1
(0, 6736) 2
(0, 7111) 1
(0, 7698) 1
(0, 8013) 2

say

u

what I need is to given words like "say" extract its id "6736" (something vise versa of what bow_transformer.get_feature_names()[6736] is doning ) ?!

Upvotes: 0

Views: 295

Answers (1)

elyase
elyase

Reputation: 40973

You should use the vocabulary_ property:

>>> bow_transformer.vocabulary_.get('say')
6736

Upvotes: 3

Related Questions