Reputation: 129
I am calling the mllib Statistics.corr() function and receiving the following error:
(x: org.apache.spark.api.java.JavaRDD[java.lang.Double],y: org.apache.spark.api.java.JavaRDD[java.lang.Double],method: String)scala.Double (x: org.apache.spark.rdd.RDD[scala.Double],y: org.apache.spark.rdd.RDD[scala.Double],method: String)scala.Double cannot be applied to (org.apache.spark.rdd.RDD[List[scala.Double]], org.apache.spark.rdd.RDD[List[scala.Double]], String)
println(Statistics.corr(a, b, "pearson"))
What do I need to do to convert my datatype to the proper input type of corr()?
Upvotes: 1
Views: 952
Reputation: 22156
As suggested in this answer, you want to flatten
your RDD
s. Unfortunately, there is no flatten
method on RDD
, so you can use flatMap(identity)
.
println(Statistics.corr(a.flatMap(identity), b.flatMap(identity), "pearson"))
Upvotes: 0
Reputation: 10428
Try using flatMap
, with the identity function:
val doubleRDD = listDoubleRDD.flatMap(identity)
Upvotes: 4