Reputation: 4461
In the SparkPi example that comes with the distribution of Spark, is the reduce
on the RDD, executed in parallel (each slice calculates its total), or not?
val count: Int = spark.sparkContext.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
Upvotes: 0
Views: 316
Reputation: 966
Yes, it is.
By default, this example will operate on 2 slices. As a result, your collection will be split into 2 parts. Then Spark will execute the map
transformation and reduce
action on each partition in parallel. Finally, Spark will merge the individual results into the final value.
You can observe 2 tasks in the console output if the example is executed using the default config.
Upvotes: 2