Reputation: 341
In the Python code:
tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
I want to find a way to fill the values of the corpus_tfidf
manually as I already have a list of lists of tfidfs for each document in the corpus, calculated using specific equations.
So, how to use them to fill the corpus_tfidf
instead of recalculating them using gensim
calculations.
I want to use my values to be passed for the gensim
LSI and LDA models.
Upvotes: 1
Views: 740
Reputation: 1352
Seems to me that if you can manually assign the idfs
attribute, then you should be able to transform a corpus without re-fitting. Hope this helps.
Self-contained example:
from gensim.models import TfidfModel
from gensim.corpora import Dictionary
# trained version
corpus = ['cow', 'brown thing', 'cow thing']
corpus = [x.split() for x in corpus]
dct = Dictionary(corpus)
corpus_as_bow = [dct.doc2bow(x) for x in corpus]
model_trained = TfidfModel(corpus_as_bow)
corpus_tfidf_trained = model_trained[corpus_as_bow]
# not trained version
model_not_trained = TfidfModel()
model_not_trained.idfs = {0: 0.5849625007211562, 1: 1.5849625007211563, 2: 0.5849625007211562}
corpus_tfidf_not_trained = model_not_trained[corpus_as_bow]
# check equivalence
list(corpus_tfidf_trained) == list(corpus_tfidf_not_trained)
True
Upvotes: 2