Reputation: 8281
I wrote the following code to sum by keys :
// ("A", 1) ("A", 4)
// ("B", 2) --> ("B", 2)
// ("A", 3)
def sumByKeys[A](tuples: List[(A, Long)]) : List[(A, Long)] = {
tuples.groupBy(_._1).mapValues(_.map(_._2).sum).toList
}
Is there a better way ?
Update : add .toList
at the end
Upvotes: 3
Views: 3049
Reputation: 20405
Using reduce,
def sumByKeys[A](tuples: List[(A, Long)]): List[(A, Long)] = {
tuples groupBy(_._1) map { _._2 reduce { (a,b) => (a._1, a._2+b._2) } } toList
}
a short for
def sumByKeys[A](tuples: List[(A, Long)]): List[(A, Long)] = {
tuples groupBy(_._1) map { case(k,v) => v reduce { (a,b) => (a._1, a._2+b._2) } } toList
}
Upvotes: 0
Reputation: 843
I guess this is the most simple immutable form without usage of any additional framework on top of scala.
UPD Actually forgot about final toList. This makes totally different picture in terms of perfomance, because of the mapValues view return type
You can try foldLeft, tailrec, something mutable and they have better perfomance
import annotation.tailrec
@tailrec
final def tailSum[A](tuples: List[(A, Long)], acc: Map[A, Long] = Map.empty[A, Long]): List[(A, Long)] = tuples match {
case (k, v) :: tail => tailSum(tail, acc + (k -> (v + acc.get(k).getOrElse(0L))))
case Nil => acc.toList
}
def foldLeftSum[A](tuples: List[(A, Long)]) = tuples.foldLeft(Map.empty[A, Long])({
case (acc, (k, v)) => acc + (k -> (v + acc.get(k).getOrElse(0L)))
}).toList
def mutableSum[A](tuples: List[(A, Long)]) = {
val m = scala.collection.mutable.Map.empty[A, Long].withDefault(_ => 0L)
for ((k, v) <- tuples) m += (k -> (v + m(k)))
m.toList
}
Updated perfomance testing is here https://gist.github.com/baskakov/8437895, briefly:
scala> avgTime("default", sumByKeys(tuples))
default avg time is 63 ms
scala> avgTime("tailrec", tailSum(tuples))
tailrec avg time is 48 ms
scala> avgTime("foldleft", foldLeftSum(tuples))
foldleft avg time is 45 ms
scala> avgTime("mutableSum", mutableSum(tuples))
mutableSum avg time is 41 ms
Upvotes: 2
Reputation: 16412
A solution very similar to yours:
def sumByKeys[A](tuples: List[(A, Long)]): List[(A, Long)] =
tuples groupBy (_._1) map { case (k, v) => (k, v.map(_._2).sum) } toList
val l: List[(String, Long)] = List(("A", 1), ("B", 2), ("A", 3))
sumByKeys(l)
// result:
// List[(String, Long)] = List((A,4), (B,2))
What's interesting is that in your solution you use def mapValues[C](f: (B) ⇒ C): Map[A, C]
which according to docs has "lazy" evaluation: "Transforms this map by applying a function to every retrieved value."
On the other hand def map[B](f: (A) ⇒ B): Map[B]
will build a new collection: "Builds a new collection by applying a function to all elements of this immutable map."
So depending on your needs you could be lazily evaluating a large map, or eagerly evaluating a small one.
Upvotes: 1
Reputation: 5426
The best I can think of gets you slightly better performance and save two characters:
def sumByKeys[A](tuples: List[(A, Long)]) : List[(A, Long)] = {
tuples.groupBy(_._1).mapValues(_.unzip._2.sum)
}
On my machine with Bask.ws' benchmark it took 11ms instead of 13ms without the unzip
.
EDIT: In fact I think the performance must be the same... don't know where those 2ms come from
Upvotes: 1