Reputation: 1545
I have two questions related to the usage of gensim for LDA.
1) How can I create a model using one corpus, save it and perhaps extend it later on another corpus by training the model on it ? Is it possible ?
2) Can LDA be used to classify an unseen document, or the model needs to be created again by including it in the corpus ? Is there an online way to do it and see the changes on the fly ?
I have a fairly basic understanding of LDA and have used it for Topic modeling on simple corpus using lda and gensim libraries. Please point out any conceptual inconsistencies in the question. Thanks !
Upvotes: 0
Views: 1253
Reputation: 1545
I found this to be helpful. Gensim does allow for an extra corpus to be added(updated) to an existing LDA model. This module allows both LDA model estimation from a training corpus and inference of topic distribution on new, unseen documents. This is described here -
https://radimrehurek.com/gensim/models/ldamodel.html
Additionally, the algorithm is streamed and can process corpora larger than the RAM. It also has a multicore implementation to speed up the process.
lda = LdaModel(corpus, num_topics=10)
lda.update(other_corpus)
This is how the model can be updated.
Upvotes: 2