Reputation: 65
How can I know the value of a specific word using the TfidfVectorizer function? For example, my code is:
docs = []
docs.append("this is sentence number one")
docs.append("this is sentence number two")
vectorizer = TfidfVectorizer(norm='l2',min_df=0, use_idf=True, smooth_idf=True, stop_words='english', sublinear_tf=True)
sklearn_representation = vectorizer.fit_transform(docs)
Now, how can I know the TF-IDF value of "sentence" in the sentence 2 (docs[1])?
Upvotes: 1
Views: 2379
Reputation: 95993
You need to use the vectorizer
's vocabulary_
attribute, which is a mapping of terms to feature indices.
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> docs = []
>>> docs.append("this is sentence number one")
>>> docs.append("this is sentence number two")
>>> vectorizer = TfidfVectorizer(norm='l2',min_df=0, use_idf=True, smooth_idf=True, stop_words='english', sublinear_tf=True)
>>> x = vectorizer.fit_transform(docs)
>>> x.todense()
matrix([[ 0.70710678, 0.70710678],
[ 0.70710678, 0.70710678]])
>>> vectorizer.vocabulary_['sentence']
1
>>> c = vectorizer.vocabulary_['sentence']
>>> x[:,c]
<2x1 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> x[:,c].todense()
matrix([[ 0.70710678],
[ 0.70710678]])
Upvotes: 1