Ziyuan
Ziyuan

Reputation: 4558

Training a LDA model with gensim from some external tf-idf matrix and term list

I have a tf-idf matrix already, with rows for terms and columns for documents. Now I want to train a LDA model with the given terms-documents matrix. The first step seems to be using gensim.matutils.Dense2Corpus to convert the matrix into the corpus format. But how to construct the id2word parameter? I have the list of the terms (#terms==#rows) but I don't know the format of the dictionary so I cannot construct the dictionary from functions like gensim.corpora.Dictionary.load_from_text. Any suggestions? Thank you.

Upvotes: 3

Views: 587

Answers (1)

Radim
Radim

Reputation: 4266

id2word must map each id (integer) to term (string).

In other words, it must support id2word[123] == 'koala'.

A plain Python dict is the easiest option.

Upvotes: 1

Related Questions