Reputation: 7730
I would like to get matrix out of Tfidfvectorizer object from sklearn. Here is my code:
from sklearn.feature_extraction.text import TfidfVectorizer
text = ["The quick brown fox jumped over the lazy dog.",
"The dog.",
"The fox"]
vectorizer = TfidfVectorizer()
vectorizer.fit_transform(text)
Here is what I tried and got back errors:
vectorizer.toarray()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-117-76146e626284> in <module>() ----> 1 vectorizer.toarray() AttributeError: 'TfidfVectorizer' object has no attribute 'toarray'
another attempt
vectorizer.todense()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-118-6386ee121184> in <module>() ----> 1 vectorizer.todense() AttributeError: 'TfidfVectorizer' object has no attribute 'todense'
Upvotes: 3
Views: 8151
Reputation: 88236
Note that vectorizer.fit_transform
returns the term-document matrix that you want to obtain. So save what it returns, and use todense
, as it will be in sparse format:
Returns: X : sparse matrix, [n_samples, n_features]. Tf-idf-weighted document-term matrix.
a = vectorizer.fit_transform(text)
a.todense()
matrix([[0.36388646, 0.27674503, 0.27674503, 0.36388646, 0.36388646,
0.36388646, 0.36388646, 0.42983441],
[0. , 0.78980693, 0. , 0. , 0. ,
0. , 0. , 0.61335554],
[0. , 0. , 0.78980693, 0. , 0. ,
0. , 0. , 0.61335554]])
Upvotes: 5
Reputation: 21709
.fit_transform
itself returns a document term matrix. So, you do:
matrix = vectorizer.fit_transform(text)
matrix.todense()
use to convert the sparse to dense matrix.
matrix.shape
will give you the shape of matrix.
Upvotes: 2