Klue
Klue

Reputation: 1357

How to use the function that expects RDD[(Double, Double)] as an input for a case of Seq[Seq[(Double, Double)]]?

I have a variable of the type Seq[Seq[(Double, Double)]]:

val output: Seq[Seq[(Double, Double)]] = runStreams(ssc, numBatches, numBatches)

Now I want to apply the function RegressionMetrics that takes RDD[(Double, Double)] as an input:

val metrics = new RegressionMetrics(output)

How to transform Seq[Seq[(Double, Double)]] to RDD[(Double, Double)]` in order to be able to use functions of the class RegressionMetrics?

Upvotes: 0

Views: 186

Answers (1)

Tzach Zohar
Tzach Zohar

Reputation: 37852

RDD is Apache Spark's abstraction for a Distributed Resilient Dataset

To create an RDD you'll need an instance of SparkContext, which can be thought of as a "connection" or "handle" to a cluster running Apache Spark.

Assuming:

  • You have an instantiated SparkContext
  • You want to treat your input as a "flat" sequence of (Double, Double) values, ignoring the way these are currently "split" into sub-sequences in Seq[Seq[(Double, Double)]]

You can create an RDD as follows:

val sc: SparkContext = ???
val output: Seq[Seq[(Double, Double)]] = ???

val rdd: RDD[(Double, Double)] = sc.parallelize(output.flatten)

Upvotes: 1

Related Questions