What is the probability of a TERM for a specific TOPIC in Latent Dirichlet Allocation (LDA) in R

Question

I'm working in R, package "topicmodels". I'm trying to work out and better understand the code/package. In most of the tutorials, documentation I'm reading I'm seeing people define topics by the 5 or 10 most probable terms. Here is an example:

    library(topicmodels)
    data("AssociatedPress", package = "topicmodels")
    lda <- LDA(AssociatedPress[1:20,],  k = 5)
    topics(lda)
    terms(lda)
    terms(lda,5)

so the last part of the code returns me the 5 most probable terms associated with the 5 topics I've defined.

In the lda object, i can access the gamma element, which contains per document the probablity of beloning to each topic. So based on this I can extract the topics with a probability greater than any threshold I prefer, instead of having for everyone the same number of topics.

But my second step would then to know which words are strongest associated to the topics. I can use the terms(lda) function to pull this out, but this gives me the N so many.

In the output I've also found the

    lda@beta

which contains the beta per word per topic, but this is a Beta value, which I'm having a hard time interpreting. They are all negative values, and though I see some values around -6, and other around -200, i can't interpret this as a probability or a measure to see which words and how much stronger certain words associate to a topic. Is there a way to pull out/calculate anything that can be interpreted as such a measure.

many thanks Frederik

What is the probability of a TERM for a specific TOPIC in Latent Dirichlet Allocation (LDA) in R

Answers (1)

Related Questions