Reputation: 899
I have list of tuples which contains userId
and point
. I want to combine or reduce this list by the key.
val points: List[(Int, Double)] = List(
(1, 1.0),
(2, 3.2),
(4, 2.0),
(1, 4.0),
(2, 6.8)
)
The expected result should look like:
List((1, 5.0), (2, 10.0), (4, 2.0))
I tried with groupBy
and mapValue
, but got an error:
val aggrPoint: Map[Int, Double] = incomes.groupBy(_._1).mapValues(seq => seq.reduce(_._2 + _._2))
Error:(16, 180) type mismatch;
found : Double
required: (Int, Double)
What am I doing wrong, and is there a idiomatic way to achieve this?
P.S) I found that in Spark aggregateByKey
does this job. But, is there a built-in method in Scala?
Upvotes: 1
Views: 12079
Reputation: 1568
Using collect
points.groupBy(_._1).collect{
case e => e._1 -> e._2.map(_._2).sum
}.toList
//res1: List[(Int, Double)] = List((2,10.0), (4,2.0), (1,5.0))
Upvotes: 0
Reputation: 41957
What am I doing wrong, and is there a idiomatic way to achieve this?
let's go step by step to see what are you doing wrong. (I am going to use REPL)
first of all lets define the points
scala> val points: List[(Int, Double)] = List(
| (1, 1.0),
| (2, 3.2),
| (4, 2.0),
| (1, 4.0),
| (2, 6.8)
| )
points: List[(Int, Double)] = List((1,1.0), (2,3.2), (4,2.0), (1,4.0), (2,6.8))
As you can see that you have List[Tuple2[Int, Double]]
so when you do groupBy
and mapValues
as
scala> points.groupBy(_._1).mapValues(seq => println(seq))
List((2,3.2), (2,6.8))
List((4,2.0))
List((1,1.0), (1,4.0))
res1: scala.collection.immutable.Map[Int,Unit] = Map(2 -> (), 4 -> (), 1 -> ())
You can see that seq
object is of List[Tuple2[Int, Double]]
again but only contains the grouped tuples as list.
So when you apply seq.reduce(_._2 + _._2)
, the reduce
function takes two inputs of Tuple2[Int, Double]
but the output is Double
only which doesn't match for the next iteration on seq
as the expected input is Tuple2[Int, Double
]. Thats the main issue. All you have to do is match the input and output types for reduce
function
One way would be to match Tuple2[Int, Double]
as
scala> points.groupBy(_._1).mapValues(seq => seq.reduce{(x,y) => (x._1, x._2 + y._2)})
res6: scala.collection.immutable.Map[Int,(Int, Double)] = Map(2 -> (2,10.0), 4 -> (4,2.0), 1 -> (1,5.0))
But this isn't your desired output, so you can extract the double value from the reduced Tuple2[Int, Double]
as
scala> points.groupBy(_._1).mapValues(seq => seq.reduce{(x,y) => (x._1, x._2 + y._2)}._2)
res8: scala.collection.immutable.Map[Int,Double] = Map(2 -> 10.0, 4 -> 2.0, 1 -> 5.0)
or you can just use map
before you apply reduce
function as
scala> points.groupBy(_._1).mapValues(seq => seq.map(_._2).reduce(_ + _))
res3: scala.collection.immutable.Map[Int,Double] = Map(2 -> 10.0, 4 -> 2.0, 1 -> 5.0)
I hope the explanation is clear enough to understand your mistake and you must have understood how a reduce
function works
Upvotes: 4
Reputation: 22439
You can map
the tuples in the mapValues
to their 2nd elements then sum
them as follows:
points.groupBy(_._1).mapValues( _.map(_._2).sum ).toList
// res1: List[(Int, Double)] = List((2,10.0), (4,2.0), (1,5.0))
Upvotes: 3