Victor Wang
Victor Wang

Reputation: 937

How to apply a sentence-level LDA model using Gensim?

Is it possible to apply a sentence-level LDA model using Gensim as proposed in Bao and Datta(2014)? The paper is here.

The distinct feature is that it makes the "one topic per sentence assumption" (p.1376). This is different from other sentence-level methods, which typically allow each sentence to include multiple topics. "The most straightforward method is to treat each sentence as a document and apply the LDA model on the collection of sentences rather than documents." (P.1376). But, I think it is more reasonable to assume that one sentence deals with one topic.

Thank you!

Upvotes: 0

Views: 1355

Answers (1)

jhl
jhl

Reputation: 691

You can run what Brody & Elhadad (2010) call local-LDA - just feeding your text data to LDA sentence by sentence - easily, if you split your documents into sentences. However, LDA will still give you more than one topic per sentence (by definition, you get values for all topics, although gensim has the minimum_probabiliy default of 0.01), which of course is not the same as the approach proposed by Bao & Datta.

However, the supplemental material to the article by Bao & Datta (2014) contains a C or C++ (I assume, it doesn't say in the readme) .exe plus usage instructions in the materials. You could just run that from the command line, or write a wrapper for Python (to make the output in gensim format would be icing on the cake) - if you do, please share your code, it might be helpful to others.

Upvotes: 3

Related Questions