Frank
Frank

Reputation: 4461

Spark SparkPi example

In the SparkPi example that comes with the distribution of Spark, is the reduce on the RDD, executed in parallel (each slice calculates its total), or not?

val count: Int = spark.sparkContext.parallelize(1 until n, slices).map { i =>
  val x = random * 2 - 1
  val y = random * 2 - 1
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)

Upvotes: 0

Views: 316

Answers (1)

Anton Okolnychyi
Anton Okolnychyi

Reputation: 966

Yes, it is.

By default, this example will operate on 2 slices. As a result, your collection will be split into 2 parts. Then Spark will execute the map transformation and reduce action on each partition in parallel. Finally, Spark will merge the individual results into the final value.

You can observe 2 tasks in the console output if the example is executed using the default config.

Upvotes: 2

Related Questions