Q.H.
Q.H.

Reputation: 1456

Classify Text with Gensim LDA Model

For reference, I already looked at the following questions:

  1. Gensim LDA for text classification
  2. Python Gensim LDA Model show_topics funciton

I am looking to have my LDA model trained from Gensim classify a sentence under one of the topics that the model creates. Something long the lines of

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
for line in document: # where each line in the document is its own sentence for simplicity
    print('Sentence: ', line)
    topic = lda.parse(line) # where the classification would occur
    print('Topic: ', topic)

I know gensim does not have a parse function, but how would one go about accomplishing this? Here is the documentation that I've been following but I haven't gotten anywhere with it:

https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py

Thanks in advance.

edit: More documentation- https://radimrehurek.com/gensim/models/ldamodel.html

Upvotes: 2

Views: 1337

Answers (1)

Nils_Denter
Nils_Denter

Reputation: 498

Let me get your problem right: You want to train a LDA Model on some documents an retrieve 7 topics. Then you want to classify new documents in one (or more?) of these topics, meaning you want to infer topic distributions on new, unseen documents.

If so, the gensim documentation provides answers.

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
count = 1
for line in document: # where each line in the document is its own sentence for simplicity
    print('\nSentence: ', line)
    line = line.split()
    line_bow = id2word.doc2bow(line)
    doc_lda = lda[line_bow]
    print('\nLine ' + str(count) + ' assigned to Topic ' + str(max(doc_lda)[0]) + ' with ' + str(round(max(doc_lda)[1]*100,2)) + ' probability!')
    count += 1

Upvotes: 1

Related Questions