Sugimiyanto
Sugimiyanto

Reputation: 329

How to initialize an RDD with n number of pairs of zero

I want to initialize an RDD which contains n number of pairs of zero. For example: n = 3, the expected result will be:

init: RDD[(Long, Long)] = ((0,0), (0,0), (0,0))

I need to initialize n number of pairs of RDDs. It could be thousands, or hundred thousand, even millions. If I do it using for loop with Scala code, then transform it to an RDD. It will take a long time.

var init: List[(Long, Long)] = List((0,0))
for(a <- 1 to 1000000){
  init = init :+ (0L,0L)
}
val pairRDD: RDD[(Long, Long)] = sc.parallelize(init)

Can anybody give me direction how to do it

Upvotes: 0

Views: 743

Answers (1)

akuiper
akuiper

Reputation: 215117

You can use spark.range to initialize the rdd in parallel from start:

val rdd = spark.range(1000000).map(_ => (0, 0)).rdd
// rdd: org.apache.spark.rdd.RDD[(Int, Int)] = MapPartitionsRDD[13] at rdd at <console>:23

rdd.take(5)
// res9: Array[(Int, Int)] = Array((0,0), (0,0), (0,0), (0,0), (0,0))

Upvotes: 4

Related Questions