ten
ten

Reputation: 115

Generating documents from LDA topic model

I'm learning a topic model from a set of documents and that's working well. But I'm wondering if any existing system will actually generate new documents from the topics and words in the model.

Ie. say I want a new document of topic 0, will any of Gensim/MALLET/other tools actually produce a new document given some input of my topic choice (or choices)? Or is this a roll-your-own kind of problem?

Say I have two topics:

topic #0: 0.009*river + 0.008*lake + 0.006*island + 0.005*mountain + 0.004*area + 0.004*park + 0.004*antarctic + 0.004*south + 0.004*mountains + 0.004*dam
topic #1: 0.026*relay + 0.026*athletics + 0.025*metres + 0.023*freestyle + 0.022*hurdles + 0.020*ret + 0.017*divisão + 0.017*athletes + 0.016*bundesliga + 0.014*medals

Is there any tool that will take "topic 0: .5, topic 1: .5, length: 7" and nicely produce a document like:

island freestyle river south medals mountains area

or something along those lines? I don't want to duplicate this if it already exists.

Upvotes: 0

Views: 709

Answers (1)

SJB
SJB

Reputation: 671

Have you read the developer's guide and tutorials on the Mallet website? It outlines how to create a document with a high probability of a certain topic:

    StringBuilder topicZeroText = new StringBuilder();
    Iterator<IDSorter> iterator = topicSortedWords.get(0).iterator();

    int rank = 0;
    while (iterator.hasNext() && rank < 5) {
        IDSorter idCountPair = iterator.next();
        topicZeroText.append(dataAlphabet.lookupObject(idCountPair.getID()) + " ");
        rank++;
    }

This code creates a new document with high probabiltiy of being topic 0. This code can be easily modified to contain more than one topic and have a certain length.

Upvotes: 1

Related Questions