Reputation: 11
i am trying to implement Topic Tiling algorithm on my trained lda model. For the algorithm I need all of the IDs that are assigned to a single word in an unseen document. I will then calculate the most frequent topic id for the given word and assign it as the mode of that word.
I am using the gensim lib so it is very easy to get topic->word dist, where the words are given with their probabilities. However how do I get "what topic(s) are/were assigned to a single world", meaning word->topic dists.
Example:
s = "Banks are closed on Sunday"
Topic -> Word Dist from Gensim:
TopicTag -> Prob*Word
Topic 0 -> 0,3*Bank, 0,2*are
Topic 1 -> 0,2*closed, 0,1*Sunday
Topic 2 -> 0,4*Sunday, 0,3*on
What I want:
word -> TopicTag(Frequency that given word was assigned with the specified topic tag)
Banks -> Topic1(2), Topic2(2)
Closed -> Topic0(1),Topic1 (4)
Please also note that I am not interested in parsing the Topic -> Word Dist results from Gensim, I am interested in finding an accurate way that my model assigns (numerous) topic(s) to each individual word that will come in an unseen document.
Thanks in advance.
Upvotes: 1
Views: 1152
Reputation: 61
You can get the matrix of word-topic weights from lda_model.get_lambda()
.
See also this mailing list thread: https://groups.google.com/d/msg/gensim/6N9-Y5KVQu0/soFqkEopMWgJ
Upvotes: 2
Reputation: 2541
I am also interested in knowing the answer. Although, you can get Topic -> Word Dist without parsing by:
y = ldavar.state.getlambda()
for i in range(y.shape[0]):
y[i] = y[i] / y[i].sum()
Now each row of y will give you word distribution for a topic
Upvotes: 1