Reputation: 20410
I implemented four correct versions of this function. I'd like a semi-idiomatic Scala version that runs faster and more in line with the Java-like implementations.
groupByAndCount
: Cleanest and most elegant. Unfortunately, it's slow.
foldImmutable
: Fully internally immutable. Runs even slower.
iterateToMutable
: Simple mutable version. Still slow.
iterateToJavaMutable
: Uses a Java (mutable) HashMap which offers a compute function so the code can avoid separate get/set functions for each element iteration.
fixedTypeLongCustomMap
: This is using a custom non-generics collection it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap
and runs the fastest.
Here are some jmh benchmarks:
Benchmark Mode Cnt Score Error Units
FreqMapGenerationJava.fixedTypeLongCustomMap avgt 5 0.255 ± 0.061 s/op
FreqMapGenerationJava.foldImmutable avgt 5 3.728 ± 0.318 s/op
FreqMapGenerationJava.groupByAndCount avgt 5 1.315 ± 0.405 s/op
FreqMapGenerationJava.iterateToJavaMutable avgt 5 0.654 ± 0.080 s/op
FreqMapGenerationJava.iterateToMutable avgt 5 1.356 ± 0.240 s/op
Here is full Scala code:
def foldImmutable[A](l: Seq[A]): immutable.Map[A, Int] = {
def foldF(m: immutable.Map[A, Int], a: A): immutable.Map[A, Int] = {
m + (a -> (m.getOrElse(a, 0) + 1))
}
l.foldLeft(immutable.Map[A, Int]())(foldF)
}
def groupByAndCount[A](l: Seq[A]): immutable.Map[A, Int] =
l.groupBy(x => x).mapValues(l => l.size)
def iterateToMutable[A](l: Seq[A]): mutable.Map[A, Int] = {
val m = mutable.Map[A, Int]()
for (a <- l) {
m(a) = m.getOrElse(a, 0) + 1
}
m
}
def iterateToJavaMutable[A](l: Seq[A]): java.util.Map[A, Int] = {
val m = new java.util.HashMap[A, Int]()
for (a <- l) {
m.compute(a, (k, v) => if (v == null) 1 else v + 1)
}
m
}
def fixedTypeLongCustomMap(l: Seq[Long]): Long2IntOpenHashMap = {
val m = new Long2IntOpenHashMap
for (a <- l) {
m.addTo(a, 1)
}
m
}
Upvotes: 2
Views: 715
Reputation: 4133
Check this:
def foldImmutableAggregate[A](l: Seq[A]) : Map[A, Int] = {
l.aggregate(Map[A, Int]())({ (sum, ch) => sum ++ Map(ch -> (sum.getOrElse(ch, 0) + 1)) }, (a, b) => a ++ b)
}
or
def foldImmutableAggregate[A](l: Seq[A]) : Map[A, Int] = {
l.par.aggregate(Map[A, Int]())({ (sum, ch) => sum ++ Map(ch -> (sum.getOrElse(ch, 0) + 1)) }, (a, b) => a ++ b)
}
to process the Seq in chunks
Upvotes: 1
Reputation: 14825
Have you tried Scala standard library groupBy
on both mutable and immutable collections
on Immuable Seq
scala> Seq(1, 2, 2, 2, 1, 1, 1, 4, 4).groupBy(identity).mapValues(_.size)
res16: scala.collection.immutable.Map[Int,Int] = Map(2 -> 3, 4 -> 2, 1 -> 4)
on mutable ListBuffer
scala> ListBuffer(1, 2, 1, 2).groupBy(identity).mapValues(_.size)
res17: scala.collection.immutable.Map[Int,Int] = Map(2 -> 2, 1 -> 2)
Upvotes: 0