Reputation: 403
In big data processing job, does function "fold" have lower computation performance compared with function "reduce" ?
For instance, I have the following two functions:
array1.indices.zip(array1).map(x => x._1 * x._2).reduce(_ + _)
array1.indices.zip(array1).map(x => x._1 * x._2).fold(0.0) {_ + _}
array1 is a very huge rdd array. which function has higher computation performance giving the same clustering setting.
Upvotes: 2
Views: 1506
Reputation: 67135
This is indeed the same as the one pointed out by muhuk as the guts of the Spark implementation is merely a call to an iterator
fold
from source:
(iter: Iterator[T]) => iter.fold(zeroValue)(cleanOp)
reduce
from source:
iter =>
if (iter.hasNext)Some(iter.reduceLeft(cleanF))
else None
So, this is primarily just calling into the scala implementations.
Upvotes: 1