Reputation: 1357
I have a variable of the type Seq[Seq[(Double, Double)]]
:
val output: Seq[Seq[(Double, Double)]] = runStreams(ssc, numBatches, numBatches)
Now I want to apply the function RegressionMetrics
that takes RDD[(Double, Double)]
as an input:
val metrics = new RegressionMetrics(output)
How to transform Seq[Seq[(Double, Double)]]
to RDD[(Double, Double)]` in order to be able to use functions of the class RegressionMetrics?
Upvotes: 0
Views: 186
Reputation: 37852
RDD
is Apache Spark's abstraction for a Distributed Resilient Dataset
To create an RDD
you'll need an instance of SparkContext
, which can be thought of as a "connection" or "handle" to a cluster running Apache Spark.
Assuming:
SparkContext
(Double, Double)
values, ignoring the way these are currently "split" into sub-sequences in Seq[Seq[(Double, Double)]]
You can create an RDD as follows:
val sc: SparkContext = ???
val output: Seq[Seq[(Double, Double)]] = ???
val rdd: RDD[(Double, Double)] = sc.parallelize(output.flatten)
Upvotes: 1