Anthony
Anthony

Reputation: 35938

found: org.apache.spark.sql.Dataset[(Double, Double)] required: org.apache.spark.rdd.RDD[(Double, Double)]

I am getting the error below

 found   : org.apache.spark.sql.Dataset[(Double, Double)]
 required: org.apache.spark.rdd.RDD[(Double, Double)]
    val testMetrics = new BinaryClassificationMetrics(testScoreAndLabel)

On the following code:

val testScoreAndLabel = testResults.
    select("Label","ModelProbability").
    map{ case Row(l:Double,p:Vector) => (p(1),l) }
val testMetrics = new BinaryClassificationMetrics(testScoreAndLabel)

From the error it seems that testScoreAndLabel is of type sql.Dataset but BinaryClassificationMetrics expects an RDD.

How can I convert a sql.Dataset into an RDD?

Upvotes: 2

Views: 2028

Answers (1)

mrsrinivas
mrsrinivas

Reputation: 35404

I'd do something like this

val testScoreAndLabel = testResults.
    select("Label","ModelProbability").
    map{ case Row(l:Double,p:Vector) => (p(1),l) }

Now convert testScoreAndLabel to RDD just by doing testScoreAndLabel.rdd

val testMetrics = new BinaryClassificationMetrics(testScoreAndLabel.rdd)

API Doc

Upvotes: 4

Related Questions