mahshid
mahshid

Reputation: 57

Calculate SVD on a TF-IDF matrix

I want to perform Singular Value Decomposition on a TF-IDF matrix. But the TF-IDF matrix gives me something like this (index of term,score):

[(1,0.2) , (2,0.3) , (6,0.1) ...]
[(3,0.2) , (5,0.3) , (10,0.1) ...]

So the code u,s,v = svd(corpus_tfidf) will not work on it. I want a TF-IDF matrix that only has scores, not terms indices.

I have calculated TF-IDF like this:

tfidf = models.TfidfModel(corpus)
corpus_tfidf=tfidf[corpus]

Upvotes: 3

Views: 3012

Answers (1)

Eduard Ilyasov
Eduard Ilyasov

Reputation: 3308

If you use gensim for tfidf generation, you can use matutils to convert your tfidf representation to dense numpy ndarray and vice versa.

from gensim import matutils
tfidf_dense = matutils.corpus2dense(corpus_tfidf, num_terms).T

where num_terms is a number of unique terms in your corpus. It can be calculated this way:

num_terms = len(corpus_tfidf.obj.idfs)

Upvotes: 4

Related Questions