jiny
jiny

Reputation: 11

Run LDA algorithm on Spark 2.0

I use spark 2.0.0 and I'd like to train a LDA model to Tweets dataset, when I try to execute

val ldaModel = new LDA().setK(3).run(corpus)

I get this error

error: reference to LDA is ambiguous;
it is imported twice in the same scope by import org.apache.spark.ml.clustering.LDA and import org.apache.spark.mllib.clustering.LDA

Could someone please help me ? Thanks !

Upvotes: 1

Views: 253

Answers (1)

Alexey Svyatkovskiy
Alexey Svyatkovskiy

Reputation: 646

It looks like you have both of the following import statements:

import org.apache.spark.ml.clustering.LDA
import org.apache.spark.mllib.clustering.LDA

You would need to remove one of them.

If you are using Spark ML (data frame based API), the proper syntax would be:

import org.apache.spark.ml.clustering.LDA

/*feature extraction step*/

val lda = new LDA().setK(3)
val model = lda.fit(corpus)

if you are using RDD-based API then you would have to write:

import org.apache.spark.mllib.clustering.LDA

/*feature extraction step*/

val lda = new LDA().setK(3)
val model = lda.run(corpus)

Upvotes: 1

Related Questions