Gabrer
Gabrer

Reputation: 475

Debug a Latent Dirichlet Allocation implementation

I have implemented my own Python LDA; a very didactic implementation. I was wondering how to debug my code; due to the statistic nature of LDA every execution returns different results and for the same reason, I can't directly compare this result with other libraries.

So, how can I debug my implementation? Is there any [corpus, document] dataset, that I can use to have some target topics to be extracted (to some extent)?

Upvotes: 1

Views: 224

Answers (2)

R.Habib
R.Habib

Reputation: 29

Two things that you can do to test your implementation:

1) Generate some synthetic data from a known set of topic distributions i.e actually using the LDA model and make sure that you are able to recover something approaching that data.

2) Calculate the logliklihood of the data under the current topic assignment at each iteration and make sure that this is gradually increasing with each iteration. (not monotonically).

This wont guarantee that your implementation is correct but it should help you spot anything obviously wrong.

Upvotes: 1

Nathan Thompson
Nathan Thompson

Reputation: 335

You have two calls to the random library in your code, one to randint on line 221 and one to random.choice on line 336. Setting the random seed at the beginning with:

random.seed(666) # The most metal of random seeds

should give you reproducible results for debugging.

Another handy trick for debugging in general is to add

import ipdb; ipdb.settrace()

to one of your loops. This will stop the execution and bring up a console prompt at that point, which let poke around and make sure everything is doing what it's supposed to.

Finally, when in doubt, print() it out..

Upvotes: 1

Related Questions