Reputation: 187
I am very new to Scala this problem I was trying to solve in spark, which also uses Scala for performing operation on RDD's.
Till now, I have only seen aggregate functions with only one initial value (i,e some-input.aggregate(Initial-value)((acc,value)=>(acc+value)))
, but this program has two initial values (0,0).
As per my understanding this program is for calculating the running average and keeping track of the count so far.
val result = input.aggregate((0, 0))(
(acc, value) => (acc._1 + value, acc._2 + 1),
(acc1, acc2) => (acc1._1 + acc2._1, acc1._2 + acc2._2))
val avg = result._1 / result._2.toDouble
I know that in foldLeft
/ aggregate
we supply initial values, so that in case of empty collection we get the default value, and both have accumulator and value part.
But in this case, we have two initial values, and accumulator is accessing tuple values. Where is this tuple defined?
Can someone please explain this whole program line by line.
Upvotes: 1
Views: 597
Reputation: 149528
but this program has two initial values (0,0).
They aren't two parameters, they're one Tuple2
:
input.aggregate((0, 0))
The value passed to aggregate
is surrounded by additional round brackets, (( )
), which are used as syntactic sugar for Tuple2.apply
. This is where you're seeing the tuple come from.
If you look a the method definition (I'm assuming this is RDD.aggregate
), you'll see it takes a single parameter in the first argument list:
def aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)
(implicit arg0: ClassTag[U]): U
Upvotes: 6