Reputation: 475
I have implemented my own Python LDA; a very didactic implementation. I was wondering how to debug my code; due to the statistic nature of LDA every execution returns different results and for the same reason, I can't directly compare this result with other libraries.
So, how can I debug my implementation? Is there any [corpus, document] dataset, that I can use to have some target topics to be extracted (to some extent)?
Upvotes: 1
Views: 224
Reputation: 29
Two things that you can do to test your implementation:
1) Generate some synthetic data from a known set of topic distributions i.e actually using the LDA model and make sure that you are able to recover something approaching that data.
2) Calculate the logliklihood of the data under the current topic assignment at each iteration and make sure that this is gradually increasing with each iteration. (not monotonically).
This wont guarantee that your implementation is correct but it should help you spot anything obviously wrong.
Upvotes: 1
Reputation: 335
You have two calls to the random library in your code, one to randint
on line 221 and one to random.choice
on line 336. Setting the random seed at the beginning with:
random.seed(666) # The most metal of random seeds
should give you reproducible results for debugging.
Another handy trick for debugging in general is to add
import ipdb; ipdb.settrace()
to one of your loops. This will stop the execution and bring up a console prompt at that point, which let poke around and make sure everything is doing what it's supposed to.
Finally, when in doubt, print()
it out..
Upvotes: 1