How to define parameters the best solution for the algorithm for lda?

Question

Using lda topic analysis how is it possible to have a measure of performance of lda algorithm?

 library(topicmodels)
    # parameters for Gibbs sampling
    burnin <- 4000
    iter <- 2000
    thin <- 500
    seed <-list(1969,5,25,102855,2012)
    nstart <- 5
    best <- TRUE
    #Number of topics
    k <- 10
library(topicmodels)
data("AssociatedPress", package = "topicmodels")



    #Run LDA with Gibbs
    ldaOut <-LDA(AssociatedPress[1:20,], k, method="Gibbs", control=list(nstart=nstart, seed = seed, best = best, burnin =
    burnin, iter = iter, thin=thin))

Example if there is any kind of precision or recall or F-measure?

Ferran · Accepted Answer

Note that LDA is an unsupervised learning algorithm, so it is not possible to obtain measures like F1 score or accuracy since we can't compare it to the true labels. Therefore, the performance of the algorithm is generally assessed by comparing the distribution assumed by the probabilistic model during training and the log-likelihhod of a test set.

The most common metrics to monitor the performance of LDA are perplexity and log-likelihood. A model with higher log-likelihood and lower perplexity is considered to be good.

In the topicmodels library you can find the functions perplexity and logLik to extract both measures. In your case it would be something similar to:

perplexity(ldaOut, newdata = AssociatedPress[1:20,])

For computing logLik you need to pass the Gibs list from the fitted model, have a look at the documentation here (pg8): https://cran.r-project.org/web/packages/topicmodels/topicmodels.pdf

How to define parameters the best solution for the algorithm for lda?

Answers (1)

Related Questions