Reputation: 4558
I have a tf-idf matrix already, with rows for terms and columns for documents. Now I want to train a LDA model with the given terms-documents matrix. The first step seems to be using gensim.matutils.Dense2Corpus
to convert the matrix into the corpus format. But how to construct the id2word
parameter? I have the list of the terms (#terms==#rows) but I don't know the format of the dictionary so I cannot construct the dictionary from functions like gensim.corpora.Dictionary.load_from_text
. Any suggestions? Thank you.
Upvotes: 3
Views: 587
Reputation: 4266
id2word
must map each id (integer) to term (string).
In other words, it must support id2word[123] == 'koala'
.
A plain Python dict
is the easiest option.
Upvotes: 1