Reputation: 23
I have a collection of RDD
s:
val rddList = scala.collection.mutable.ListBuffer[RDD[Data]]()
that contains multiple RDD
s of the same kind of Data
but that are created using different sources. I need to combine these RDD
s into a single RDD
.
If I do an rddList.flatten
and then take the lead element will that accomplish what I want?
Upvotes: 1
Views: 901
Reputation: 23099
You need to reduce and then union to create a single RDD from a list of RDD. Below is a simple example.
val r1 = spark.sparkContext.parallelize(1 to 5)
val r2 = spark.sparkContext.parallelize(5 to 10)
val r3 = spark.sparkContext.parallelize(10 to 15)
val list = ListBuffer(r1,r2,r3)
list.reduce(_ union _).collect().foreach(println)
Hope this helps!
Upvotes: 3