clay
clay

Reputation: 20410

Scala efficiently convert Seq[A] to frequency map Map[A, Int]

I implemented four correct versions of this function. I'd like a semi-idiomatic Scala version that runs faster and more in line with the Java-like implementations.

groupByAndCount: Cleanest and most elegant. Unfortunately, it's slow.

foldImmutable: Fully internally immutable. Runs even slower.

iterateToMutable: Simple mutable version. Still slow.

iterateToJavaMutable: Uses a Java (mutable) HashMap which offers a compute function so the code can avoid separate get/set functions for each element iteration.

fixedTypeLongCustomMap: This is using a custom non-generics collection it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap and runs the fastest.

Here are some jmh benchmarks:

Benchmark                                     Mode  Cnt  Score   Error  Units
FreqMapGenerationJava.fixedTypeLongCustomMap  avgt    5  0.255 ± 0.061   s/op
FreqMapGenerationJava.foldImmutable           avgt    5  3.728 ± 0.318   s/op
FreqMapGenerationJava.groupByAndCount         avgt    5  1.315 ± 0.405   s/op
FreqMapGenerationJava.iterateToJavaMutable    avgt    5  0.654 ± 0.080   s/op
FreqMapGenerationJava.iterateToMutable        avgt    5  1.356 ± 0.240   s/op

Here is full Scala code:

  def foldImmutable[A](l: Seq[A]): immutable.Map[A, Int] = {
    def foldF(m: immutable.Map[A, Int], a: A): immutable.Map[A, Int] = {
      m + (a -> (m.getOrElse(a, 0) + 1))
    }

    l.foldLeft(immutable.Map[A, Int]())(foldF)
  }

  def groupByAndCount[A](l: Seq[A]): immutable.Map[A, Int] =
    l.groupBy(x => x).mapValues(l => l.size)

  def iterateToMutable[A](l: Seq[A]): mutable.Map[A, Int] = {
    val m = mutable.Map[A, Int]()
    for (a <- l) {
      m(a) = m.getOrElse(a, 0) + 1
    }
    m
  }

  def iterateToJavaMutable[A](l: Seq[A]): java.util.Map[A, Int] = {
    val m = new java.util.HashMap[A, Int]()
    for (a <- l) {
      m.compute(a, (k, v) => if (v == null) 1 else v + 1)
    }
    m
  }

  def fixedTypeLongCustomMap(l: Seq[Long]): Long2IntOpenHashMap = {
    val m = new Long2IntOpenHashMap
    for (a <- l) {
      m.addTo(a, 1)
    }
    m
  }

Upvotes: 2

Views: 715

Answers (2)

Emiliano Martinez
Emiliano Martinez

Reputation: 4133

Check this:

def foldImmutableAggregate[A](l: Seq[A]) : Map[A, Int] = {
  l.aggregate(Map[A, Int]())({ (sum, ch) => sum ++ Map(ch -> (sum.getOrElse(ch, 0) + 1)) }, (a, b) => a ++ b) 
}

or

def foldImmutableAggregate[A](l: Seq[A]) : Map[A, Int] = {
  l.par.aggregate(Map[A, Int]())({ (sum, ch) => sum ++ Map(ch -> (sum.getOrElse(ch, 0) + 1)) }, (a, b) => a ++ b) 
}

to process the Seq in chunks

Upvotes: 1

Nagarjuna Pamu
Nagarjuna Pamu

Reputation: 14825

Have you tried Scala standard library groupBy on both mutable and immutable collections

on Immuable Seq

scala> Seq(1, 2, 2, 2, 1, 1, 1, 4, 4).groupBy(identity).mapValues(_.size)
res16: scala.collection.immutable.Map[Int,Int] = Map(2 -> 3, 4 -> 2, 1 -> 4)

on mutable ListBuffer

scala> ListBuffer(1, 2, 1, 2).groupBy(identity).mapValues(_.size)
res17: scala.collection.immutable.Map[Int,Int] = Map(2 -> 2, 1 -> 2)

Upvotes: 0

Related Questions