gyoho
gyoho

Reputation: 899

Scala - Reduce list of tuples by key

I have list of tuples which contains userId and point. I want to combine or reduce this list by the key.

val points: List[(Int, Double)] = List(
  (1, 1.0),
  (2, 3.2),
  (4, 2.0),
  (1, 4.0),
  (2, 6.8)
)

The expected result should look like:

List((1, 5.0), (2, 10.0), (4, 2.0))

I tried with groupBy and mapValue, but got an error:

val aggrPoint: Map[Int, Double] = incomes.groupBy(_._1).mapValues(seq => seq.reduce(_._2 + _._2))

Error:(16, 180) type mismatch;
 found   : Double
 required: (Int, Double)

What am I doing wrong, and is there a idiomatic way to achieve this?

P.S) I found that in Spark aggregateByKey does this job. But, is there a built-in method in Scala?

Upvotes: 1

Views: 12079

Answers (3)

Puneeth Reddy V
Puneeth Reddy V

Reputation: 1568

Using collect

points.groupBy(_._1).collect{
   case e => e._1 -> e._2.map(_._2).sum
}.toList
//res1: List[(Int, Double)] = List((2,10.0), (4,2.0), (1,5.0))

Upvotes: 0

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

What am I doing wrong, and is there a idiomatic way to achieve this?

let's go step by step to see what are you doing wrong. (I am going to use REPL)

first of all lets define the points

scala> val points: List[(Int, Double)] = List(
     |   (1, 1.0),
     |   (2, 3.2),
     |   (4, 2.0),
     |   (1, 4.0),
     |   (2, 6.8)
     | )
points: List[(Int, Double)] = List((1,1.0), (2,3.2), (4,2.0), (1,4.0), (2,6.8))

As you can see that you have List[Tuple2[Int, Double]] so when you do groupBy and mapValues as

scala> points.groupBy(_._1).mapValues(seq => println(seq))
List((2,3.2), (2,6.8))
List((4,2.0))
List((1,1.0), (1,4.0))
res1: scala.collection.immutable.Map[Int,Unit] = Map(2 -> (), 4 -> (), 1 -> ())

You can see that seq object is of List[Tuple2[Int, Double]] again but only contains the grouped tuples as list.

So when you apply seq.reduce(_._2 + _._2), the reduce function takes two inputs of Tuple2[Int, Double] but the output is Double only which doesn't match for the next iteration on seq as the expected input is Tuple2[Int, Double]. Thats the main issue. All you have to do is match the input and output types for reduce function

One way would be to match Tuple2[Int, Double] as

scala> points.groupBy(_._1).mapValues(seq => seq.reduce{(x,y) => (x._1, x._2 + y._2)})
res6: scala.collection.immutable.Map[Int,(Int, Double)] = Map(2 -> (2,10.0), 4 -> (4,2.0), 1 -> (1,5.0))

But this isn't your desired output, so you can extract the double value from the reduced Tuple2[Int, Double] as

scala> points.groupBy(_._1).mapValues(seq => seq.reduce{(x,y) => (x._1, x._2 + y._2)}._2)
res8: scala.collection.immutable.Map[Int,Double] = Map(2 -> 10.0, 4 -> 2.0, 1 -> 5.0)

or you can just use map before you apply reduce function as

scala> points.groupBy(_._1).mapValues(seq => seq.map(_._2).reduce(_ + _))
res3: scala.collection.immutable.Map[Int,Double] = Map(2 -> 10.0, 4 -> 2.0, 1 -> 5.0)

I hope the explanation is clear enough to understand your mistake and you must have understood how a reduce function works

Upvotes: 4

Leo C
Leo C

Reputation: 22439

You can map the tuples in the mapValues to their 2nd elements then sum them as follows:

points.groupBy(_._1).mapValues( _.map(_._2).sum ).toList
// res1: List[(Int, Double)] = List((2,10.0), (4,2.0), (1,5.0))

Upvotes: 3

Related Questions