gensim lda model - calling update on a corpus with unseen words

Question

I'm trying to use gensim's lda model. If I create the lda model with a given corpus, and then I want to update it with a new corpus that contains words that aren't seen in the first corpus, how do I do this? When I try to just call lda_model.update(new_corpus), I get the following error:

/Library/Python/2.7/site-packages/gensim/models/ldamodel.pyc in inference(self, chunk, collect_sstats)
    361             Elogthetad = Elogtheta[d, :]
    362             expElogthetad = expElogtheta[d, :]
 -->363             expElogbetad = self.expElogbeta[:, ids]
    364 
    365             # The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w.
   IndexError: index 57 is out of bounds for axis 1 with size 57

I initialized lda_model with a corpus consisting of only 57 words, so that's why we see the size 57 bound. Then I wanted to call update on it with a corpus of many more words, and this fails.

How do I get around this? I want to be able to update my lda model with a new corpus with new words is this possible?

gensim lda model - calling update on a corpus with unseen words

Answers (1)

Related Questions