AyaZoghby
AyaZoghby

Reputation: 341

How to set the values of Tfidf Model in gensim manually

In the Python code:

tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]

I want to find a way to fill the values of the corpus_tfidf manually as I already have a list of lists of tfidfs for each document in the corpus, calculated using specific equations.

So, how to use them to fill the corpus_tfidf instead of recalculating them using gensim calculations.

I want to use my values to be passed for the gensim LSI and LDA models.

Upvotes: 1

Views: 740

Answers (1)

Ido S
Ido S

Reputation: 1352

Seems to me that if you can manually assign the idfs attribute, then you should be able to transform a corpus without re-fitting. Hope this helps.

Self-contained example:

from gensim.models import TfidfModel
from gensim.corpora import Dictionary

# trained version
corpus = ['cow', 'brown thing', 'cow thing']
corpus = [x.split() for x in corpus]
dct = Dictionary(corpus)
corpus_as_bow = [dct.doc2bow(x) for x in corpus]
model_trained = TfidfModel(corpus_as_bow)
corpus_tfidf_trained = model_trained[corpus_as_bow]

# not trained version
model_not_trained = TfidfModel()
model_not_trained.idfs = {0: 0.5849625007211562, 1: 1.5849625007211563, 2: 0.5849625007211562}
corpus_tfidf_not_trained = model_not_trained[corpus_as_bow]

# check equivalence
list(corpus_tfidf_trained) == list(corpus_tfidf_not_trained)

True

Upvotes: 2

Related Questions