Reputation: 1238
Using lda topic analysis how is it possible to have a measure of performance of lda algorithm?
library(topicmodels)
# parameters for Gibbs sampling
burnin <- 4000
iter <- 2000
thin <- 500
seed <-list(1969,5,25,102855,2012)
nstart <- 5
best <- TRUE
#Number of topics
k <- 10
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
#Run LDA with Gibbs
ldaOut <-LDA(AssociatedPress[1:20,], k, method="Gibbs", control=list(nstart=nstart, seed = seed, best = best, burnin =
burnin, iter = iter, thin=thin))
Example if there is any kind of precision or recall or F-measure?
Upvotes: 1
Views: 187
Reputation: 840
Note that LDA is an unsupervised learning algorithm, so it is not possible to obtain measures like F1 score or accuracy since we can't compare it to the true labels. Therefore, the performance of the algorithm is generally assessed by comparing the distribution assumed by the probabilistic model during training and the log-likelihhod of a test set.
The most common metrics to monitor the performance of LDA are perplexity and log-likelihood. A model with higher log-likelihood and lower perplexity is considered to be good.
In the topicmodels
library you can find the functions perplexity
and logLik
to extract both measures. In your case it would be something similar to:
perplexity(ldaOut, newdata = AssociatedPress[1:20,])
For computing logLik
you need to pass the Gibs list from the fitted model, have a look at the documentation here (pg8): https://cran.r-project.org/web/packages/topicmodels/topicmodels.pdf
Upvotes: 3