user2217543
user2217543

Reputation: 21

Dynamic Topic model output - Blei format

I am working with the Dynamic Topic Models package that was developed by Blei. I am new to LDA however I understand it.

I would like to know what does the output by the name of lda-seq/topic-000-var-obs.dat store?

I know that lda-seq/topic-001-var-e-log-prob.dat stores the log of the variational posterior and by applying the exponential over it, I get the probability of the word within Topic 001.

Thanks

Upvotes: 2

Views: 691

Answers (2)

vincentmajor
vincentmajor

Reputation: 1106

I have failed to find a concrete answer anywhere. However, since the documentation's sample.sh states

The code creates at least the following files:
- topic-???-var-e-log-prob.dat: the e-betas (word distributions) for topic ??? for all times.  
...
- gam.dat

without mentioning the topic-000-var-obs.dat file, suggests that it is not imperative for most analyses.

Speculation

obs suggest observations. After a little dig around in the example/model_run results, I plotted the sum across epochs for each word/token using:

temp = scan("dtm/example/model_run/lda-seq/topic-000-var-obs.dat")
temp.matrix = matrix(temp, ncol = 10, byrow = TRUE) 
plot(rowSums(temp.matrix))

and the result is something like:

row sums

The general trend of the non-negative values is decreasing and many values are floored (in this case to -11.00972 = log(1.67e-05)) Suggesting that these values are weightings or some other measure of influence on the model. The model removes some tokens and the influence/importance of the others tapers off over the index. The later trend may be caused by preprocessing such as sorting tokens by tf-idf when creating the dictionary.

Interestingly the row sum values varies for both the floored tokens and the set with more positive values:

different example

temp = scan("~/Documents/Python/inference/project/dtm/example/model_run/lda-seq/topic-009-var-obs.dat")
temp.matrix = matrix(temp, ncol = 10, byrow = TRUE) 
plot(rowSums(temp.matrix))

Upvotes: 0

user3032969
user3032969

Reputation: 11

Topic-000-var-e-log-prob.dat store the log of the variational posterior of the topic 1.

Topic-001-var-e-log-prob.dat store the log of the variational posterior of the topic 2.

Upvotes: 1

Related Questions