scala parallel collections not consistent

Question

I am getting inconsistent answers from the following code which I find odd.

import scala.math.pow

val p = 2
val a = Array(1,2,3)

println(a.par
    .aggregate("0")((x, y) => s"$y pow $p; ", (x, y) => x + y))

for (i <- 1 to 100) {
  println(a.par
    .aggregate(0.0)((x, y) => pow(y, p), (x, y) => x + y) == 14)
}

a.map(x => pow(x,p)).sum

In the code the a.par ... computes 14 or 10. Can anyone provide an explanation for why it is computing inconsistently?

fxlae · Accepted Answer

In your "seqop" function, that is the first function you pass to aggregate, you define the logic that is used to combine elements within the same partition. Your function looks like this:

(x, y) => pow(y, p)

The problem is that you don't accumulate the results of a partition. Instead, you throw away your accumulator x. Every time you get 10 as a result, the calculation 2^2 was dropped.

If you change your function to take the accumulated value into account, you will get 14 every time:

(x, y) => x + pow(y, p)

scala parallel collections not consistent

Answers (2)

Related Questions