Spark SparkPi example

Question

In the SparkPi example that comes with the distribution of Spark, is the reduce on the RDD, executed in parallel (each slice calculates its total), or not?

val count: Int = spark.sparkContext.parallelize(1 until n, slices).map { i =>
  val x = random * 2 - 1
  val y = random * 2 - 1
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)

Anton Okolnychyi · Accepted Answer

Yes, it is.

By default, this example will operate on 2 slices. As a result, your collection will be split into 2 parts. Then Spark will execute the map transformation and reduce action on each partition in parallel. Finally, Spark will merge the individual results into the final value.

You can observe 2 tasks in the console output if the example is executed using the default config.

Spark SparkPi example

Answers (1)

Related Questions