Reputation: 21
I am working with the Dynamic Topic Models package that was developed by Blei. I am new to LDA however I understand it.
I would like to know what does the output by the name of
lda-seq/topic-000-var-obs.dat
store?
I know that lda-seq/topic-001-var-e-log-prob.dat
stores the log of the variational posterior and by applying the exponential over it, I get the probability of the word within Topic 001.
Thanks
Upvotes: 2
Views: 691
Reputation: 1106
I have failed to find a concrete answer anywhere. However, since the documentation's sample.sh
states
The code creates at least the following files:
- topic-???-var-e-log-prob.dat: the e-betas (word distributions) for topic ??? for all times.
...
- gam.dat
without mentioning the topic-000-var-obs.dat
file, suggests that it is not imperative for most analyses.
obs
suggest observations. After a little dig around in the example/model_run
results, I plotted the sum across epochs for each word/token using:
temp = scan("dtm/example/model_run/lda-seq/topic-000-var-obs.dat")
temp.matrix = matrix(temp, ncol = 10, byrow = TRUE)
plot(rowSums(temp.matrix))
and the result is something like:
The general trend of the non-negative values is decreasing and many values are floored (in this case to -11.00972 = log(1.67e-05)
) Suggesting that these values are weightings or some other measure of influence on the model. The model removes some tokens and the influence/importance of the others tapers off over the index. The later trend may be caused by preprocessing such as sorting tokens by tf-idf when creating the dictionary.
Interestingly the row sum values varies for both the floored tokens and the set with more positive values:
temp = scan("~/Documents/Python/inference/project/dtm/example/model_run/lda-seq/topic-009-var-obs.dat")
temp.matrix = matrix(temp, ncol = 10, byrow = TRUE)
plot(rowSums(temp.matrix))
Upvotes: 0
Reputation: 11
Topic-000-var-e-log-prob.dat store the log of the variational posterior of the topic 1.
Topic-001-var-e-log-prob.dat store the log of the variational posterior of the topic 2.
Upvotes: 1