Reputation: 57
I want to perform Singular Value Decomposition on a TF-IDF matrix. But the TF-IDF matrix gives me something like this (index of term,score):
[(1,0.2) , (2,0.3) , (6,0.1) ...]
[(3,0.2) , (5,0.3) , (10,0.1) ...]
So the code u,s,v = svd(corpus_tfidf)
will not work on it.
I want a TF-IDF matrix that only has scores, not terms indices.
I have calculated TF-IDF like this:
tfidf = models.TfidfModel(corpus)
corpus_tfidf=tfidf[corpus]
Upvotes: 3
Views: 3012
Reputation: 3308
If you use gensim for tfidf generation, you can use matutils to convert your tfidf representation to dense numpy ndarray and vice versa.
from gensim import matutils
tfidf_dense = matutils.corpus2dense(corpus_tfidf, num_terms).T
where num_terms is a number of unique terms in your corpus. It can be calculated this way:
num_terms = len(corpus_tfidf.obj.idfs)
Upvotes: 4