Sal
Sal

Reputation: 187

Can someone explain this scala aggregate function with two initial values

I am very new to Scala this problem I was trying to solve in spark, which also uses Scala for performing operation on RDD's.

Till now, I have only seen aggregate functions with only one initial value (i,e some-input.aggregate(Initial-value)((acc,value)=>(acc+value))), but this program has two initial values (0,0).

As per my understanding this program is for calculating the running average and keeping track of the count so far.

val result = input.aggregate((0, 0))(
               (acc, value) => (acc._1 + value, acc._2 + 1),
               (acc1, acc2) => (acc1._1 + acc2._1, acc1._2 + acc2._2))
val avg = result._1 / result._2.toDouble

I know that in foldLeft / aggregate we supply initial values, so that in case of empty collection we get the default value, and both have accumulator and value part.

But in this case, we have two initial values, and accumulator is accessing tuple values. Where is this tuple defined?

Can someone please explain this whole program line by line.

Upvotes: 1

Views: 597

Answers (1)

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149528

but this program has two initial values (0,0).

They aren't two parameters, they're one Tuple2:

input.aggregate((0, 0))

The value passed to aggregate is surrounded by additional round brackets, (( )), which are used as syntactic sugar for Tuple2.apply. This is where you're seeing the tuple come from.

If you look a the method definition (I'm assuming this is RDD.aggregate), you'll see it takes a single parameter in the first argument list:

def aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)
                (implicit arg0: ClassTag[U]): U

Upvotes: 6

Related Questions