Felipe Winsberg
Felipe Winsberg

Reputation: 23

Flattening a List of RDD

I have a collection of RDDs:

val rddList = scala.collection.mutable.ListBuffer[RDD[Data]]()

that contains multiple RDDs of the same kind of Data but that are created using different sources. I need to combine these RDDs into a single RDD.

If I do an rddList.flatten and then take the lead element will that accomplish what I want?

Upvotes: 1

Views: 901

Answers (1)

koiralo
koiralo

Reputation: 23099

You need to reduce and then union to create a single RDD from a list of RDD. Below is a simple example.

val r1 = spark.sparkContext.parallelize(1 to 5)
val r2 = spark.sparkContext.parallelize(5 to 10)
val r3 = spark.sparkContext.parallelize(10 to 15)

  val list = ListBuffer(r1,r2,r3)

  list.reduce(_ union _).collect().foreach(println)

Hope this helps!

Upvotes: 3

Related Questions