ZhongBot
ZhongBot

Reputation: 129

Scala Spark - convert RDD[List[scala.Double]] to RDD[scala.Double]

I am calling the mllib Statistics.corr() function and receiving the following error:

(x: org.apache.spark.api.java.JavaRDD[java.lang.Double],y: org.apache.spark.api.java.JavaRDD[java.lang.Double],method: String)scala.Double (x: org.apache.spark.rdd.RDD[scala.Double],y: org.apache.spark.rdd.RDD[scala.Double],method: String)scala.Double cannot be applied to (org.apache.spark.rdd.RDD[List[scala.Double]], org.apache.spark.rdd.RDD[List[scala.Double]], String)

println(Statistics.corr(a, b, "pearson"))

What do I need to do to convert my datatype to the proper input type of corr()?

Upvotes: 1

Views: 952

Answers (2)

Zoltán
Zoltán

Reputation: 22156

As suggested in this answer, you want to flatten your RDDs. Unfortunately, there is no flatten method on RDD, so you can use flatMap(identity).

println(Statistics.corr(a.flatMap(identity), b.flatMap(identity), "pearson"))

Upvotes: 0

mattinbits
mattinbits

Reputation: 10428

Try using flatMap, with the identity function:

val doubleRDD = listDoubleRDD.flatMap(identity)

Upvotes: 4

Related Questions