Srinivasarao Daruna
Srinivasarao Daruna

Reputation: 3374

understanding aggregate in Scala

I am trying to understand aggregate in Scala and with one example, i understood the logic, but the result of second one i tried confused me.

Please let me know, where i went wrong.

Code:

val list1 = List("This", "is", "an", "example");

val b = list1.aggregate(1)(_ * _.length(), _ * _)

1 * "This".length = 4

1 * "is".length = 2

1 * "an".length = 2

1 * "example".length = 7

4 * 2 = 8 , 2 * 7 = 14

8 * 14 = 112

the output also came as 112. but for the below,

val c = list1.aggregate(1)(_ * _.length(), _ + _)

I Thought it will be like this. 4, 2, 2, 7

4 + 2 = 6

2 + 7 = 9

6 + 9 = 15,

but the output still came as 112.

It is ideally doing whatever the operation i mentioned at seqop, here _ * _.length

Could you please explain or correct me where i went wrong.?

Upvotes: 2

Views: 514

Answers (1)

jb.cdnr
jb.cdnr

Reputation: 33

aggregate should be used to compute only associative and commutative operations. Let's look at the signature of the function :

def aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B

B can be seen as an accumulator (and will be your output). You give an initial output value, then the first function is how to add a value A to this accumulator and the second is how to merge 2 accumulators. Scala "chooses" a way to aggregate your collection but if your aggregation is not associative and commutative the output is not deterministic because the order matter. Look at this example :

val l = List(1, 2, 3, 4)
l.aggregate(0)(_ + _, _ * _)

If we create one accumulator and then aggregate all the values we get 1 + 2 + 3 + 4 = 10 but if we decide to parallelize the process by splitting the list in halves we could have (1 + 2) * (3 + 4) = 21.

So now what happens in reality is that for List aggregate is the same as foldLeft which explains why changing your second function didn't change the output. But where aggregate can be useful is in Spark for example or other distributed environments where it may be useful to do the folding on each partition independently and then combine the results with the second function.

Upvotes: 1

Related Questions