Serbay
Serbay

Reputation: 11

LDA Gensim Word -> Topic Ids Distribution instead of Topic -> Word Distribution

i am trying to implement Topic Tiling algorithm on my trained lda model. For the algorithm I need all of the IDs that are assigned to a single word in an unseen document. I will then calculate the most frequent topic id for the given word and assign it as the mode of that word.

I am using the gensim lib so it is very easy to get topic->word dist, where the words are given with their probabilities. However how do I get "what topic(s) are/were assigned to a single world", meaning word->topic dists.

Example:
s = "Banks are closed on Sunday"

Topic -> Word Dist from Gensim:
TopicTag -> Prob*Word
Topic 0 -> 0,3*Bank, 0,2*are
Topic 1 -> 0,2*closed, 0,1*Sunday
Topic 2 -> 0,4*Sunday, 0,3*on

What I want:
word -> TopicTag(Frequency that given word was assigned with the specified topic tag)
Banks -> Topic1(2), Topic2(2)
Closed -> Topic0(1),Topic1 (4)

Please also note that I am not interested in parsing the Topic -> Word Dist results from Gensim, I am interested in finding an accurate way that my model assigns (numerous) topic(s) to each individual word that will come in an unseen document.

Thanks in advance.

Upvotes: 1

Views: 1152

Answers (2)

Hongwei.Song
Hongwei.Song

Reputation: 61

You can get the matrix of word-topic weights from lda_model.get_lambda(). See also this mailing list thread: https://groups.google.com/d/msg/gensim/6N9-Y5KVQu0/soFqkEopMWgJ

Upvotes: 2

Alok Nayak
Alok Nayak

Reputation: 2541

I am also interested in knowing the answer. Although, you can get Topic -> Word Dist without parsing by:

y = ldavar.state.getlambda()
for i in range(y.shape[0]):
    y[i] = y[i] / y[i].sum()

Now each row of y will give you word distribution for a topic

Upvotes: 1

Related Questions