alily
alily

Reputation: 299

clustering documents using latent Dirichlet allocation

After identification of topics/clusters of all docs using LDA algorithm, when new documents arrives in database do we need to run whole process again or is there any other way to directly map new doc with pre defined clusters/Topics by the model.

Upvotes: 1

Views: 537

Answers (2)

Brian O'Donnell
Brian O'Donnell

Reputation: 1876

To add to Lgiro's answer, gensim allows one to add a new corpus and update the LDA results. See the sample code from gensim:

lda = LdaModel(corpus, num_topics=100)  # train model
print(lda[doc_bow]) # get topic probability distribution for a document
lda.update(corpus2) # update the LDA model with additional documents
print(lda[doc_bow])

Upvotes: 0

Lgiro
Lgiro

Reputation: 772

Once you have a trained topic model, you can input a new document or set of documents and calculate a distribution of your model's topics. Not sure what you are using for LDA, but Python's Gensim library is very nice and well-documented. Consult https://radimrehurek.com/gensim/wiki.html#latent-dirichlet-allocation for more information.

Upvotes: 1

Related Questions