Reputation: 6465
I'm quite familiar with Hadoop but totally new to Apache Spark. Currently I'm using LDA (Latent Dirichlet Allocation) algorithm implemented in Mahout to do topic discovery. However as I need to make the process faster I'd like to use spark, however the LDA (or CVB) algorithm is not implemented in Spark MLib. Does this mean that I have to implement it from scratch by myself? If so, does Spark provide some tools that make it easier?
Upvotes: 2
Views: 2722
Reputation: 151
Regarding how to use the new Spark LDA API in 1.3:
Here is an article describing the new API:Topic modeling with LDA: MLlib meets GraphX
And, it links to example code showing how to vectorize text input: Github LDA Example
Upvotes: 3
Reputation: 4648
Actually Spark 1.3.0 is out now so LDA is available !!
c.f. https://issues.apache.org/jira/browse/SPARK-1405
Regards,
Upvotes: 3
Reputation: 53809
LDA has been added to Spark very recently. It is not part of the current 1.2.1 release.
Yet, you can find an example on the current SNAPSHOT version: LDAExample.scala
You can also read interesting information about the SPARK-1405 issue.
The simplest way while it is not released is probably to copy the following classes in your project, as if you coded them yourself:
Upvotes: 3