Reputation: 23
I have started to use Galago for document retrieval. I want to cluster some documents (initially retrieved documents with any model) using LDA. I prefer to use a java-based implementation that can be integrated into my code using Galago. I'd appreciate it if you could let me know what open source implementation of LDA is more suitable for my purpose.
Thank you in advance for your help!
Upvotes: 2
Views: 307
Reputation: 987
There's a fast algorithm for LDA from this paper:
S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, M. Zhu. A Practical Algorithm for Topic Modeling with Provable Guarantees. 30th International Conference on Machine Learning (ICML), 2013.
Which has a Java implementation by one of the authors (D. Mimno) on github here: https://github.com/mimno/anchor
I've poked around with this implementation briefly, and found good and fast results. Like all LDA/Topic modeling, getting the number of topics right can be challenging.
Upvotes: 0