sparklearner
sparklearner

Reputation: 403

Spark Fold vs Reduce in performance?

In big data processing job, does function "fold" have lower computation performance compared with function "reduce" ?

For instance, I have the following two functions:

    array1.indices.zip(array1).map(x => x._1 * x._2).reduce(_ + _)

    array1.indices.zip(array1).map(x => x._1 * x._2).fold(0.0) {_ + _}

array1 is a very huge rdd array. which function has higher computation performance giving the same clustering setting.

Upvotes: 2

Views: 1506

Answers (1)

Justin Pihony
Justin Pihony

Reputation: 67135

This is indeed the same as the one pointed out by muhuk as the guts of the Spark implementation is merely a call to an iterator

fold from source:

(iter: Iterator[T]) => iter.fold(zeroValue)(cleanOp)

reduce from source:

iter => 
  if (iter.hasNext)Some(iter.reduceLeft(cleanF))
  else None

So, this is primarily just calling into the scala implementations.

Upvotes: 1

Related Questions