Reputation: 936
I would like to know if an update operation on a mutable map is better in performance than reassignment.
Lets assume I have the following Map
val m=Map(1 -> Set("apple", "banana"),
2 -> Set("banana", "cabbage"),
3 -> Set("cabbage", "dumplings"))
which I would like to reverse into this map:
Map("apple" -> Set(1),
"banana" -> Set(1, 2),
"cabbage" -> Set(2, 3),
"dumplings" -> Set(3))
The code to do so is:
def reverse(m:Map[Int,Set[String]])={
var rm = Map[String,Set[Int]]()
m.keySet foreach { k=>
m(k) foreach { e =>
rm = rm + (e -> (rm.getOrElse(e, Set()) + k))
}
}
rm
}
Would it be more efficient to use the update operator on a map if it is very large in size?
The code using the update on map is as follows:
def reverse(m:Map[Int,Set[String]])={
var rm = scala.collection.mutable.Map[String,Set[Int]]()
m.keySet foreach { k=>
m(k) foreach { e =>
rm.update(e,(rm.getOrElse(e, Set()) + k))
}
}
rm
}
Upvotes: 1
Views: 1155
Reputation: 936
Just using @Aivean method in a functional way:
def transform(mp :Map[Int, Set[String]]) = {
val accum = mp.values.flatten
.toSet.map( (_-> scala.collection.mutable.Set[Int]())).toMap
mp.map {case(k,vals) => vals.map( v => accum(v)+=k)}
accum.mapValues(_.toSet)
}
Upvotes: 0
Reputation: 51271
I ran some tests using Rex Kerr's Thyme utility.
First I created some test data.
val rndm = new util.Random
val dna = Seq('A','C','G','T')
val m = (1 to 4000).map(_ -> Set(rndm.shuffle(dna).mkString
,rndm.shuffle(dna).mkString)).toMap
Then I timed some runs with both the immutable.Map
and mutable.Map
versions. Here's an example result:
Time: 2.417 ms 95% CI 2.337 ms - 2.498 ms (n=19) // immutable
Time: 1.618 ms 95% CI 1.579 ms - 1.657 ms (n=19) // mutable
Time 2.278 ms 95% CI 2.238 ms - 2.319 ms (n=19) // functional version
As you can see, using a mutable Map with update()
has a significant performance advantage.
Just for fun I also compared these results with a more functional version of a Map reverse (or what I call a Map inverter). No var
or any mutable type involved.
m.flatten{case(k, vs) => vs.map((_, k))}
.groupBy(_._1)
.mapValues(_.map(_._2).toSet)
This version consistently beat your immutable version but still doesn't come close to the mutable timings.
Upvotes: 3
Reputation: 10882
The trade-of between mutable and immutable collections usually narrows down to this:
Some time ago I did comparison of performance between mutable and immutable Maps in Scala and the difference was about 2 to 3 times in favor of mutable ones.
So, when performance is not critical I usually go with immutable collections for safety and readability.
For example, in your case functional "scala way" of performing this transformation would be something like this:
m.view
.flatMap(x => x._2.map(_ -> x._1)) // flatten map to lazy view of String->Int pairs
.groupBy(_._1) // group pairs by String part
.mapValues(_.map(_._2).toSet) // extract all Int parts into Set
Although I used lazy view to avoid creating intermediate collections, groupBy
still internally creates mutable map (you may want to check it's sources, the logic is pretty similar to what you have wrote), which in turn gets converted to immutable Map
which then gets discarded by mapValues
.
Now, if you want to squeeze every bit of performance you want to use mutable collections and do as little updates of immutable collections as possible.
For your case is means having Map
of mutable Sets
as you intermediate buffer:
def transform(m:Map[Int, Set[String]]):Map[String, Set[Int]] = {
val accum:Map[String, mutable.Set[Int]] =
m.valuesIterator.flatten.map(_ -> mutable.Set[Int]()).toMap
for ((k, vals) <- m; v <- vals) {
accum(v) += k
}
accum.mapValues(_.toSet)
}
Note, I'm not updating accum
once it's created: I'm doing exactly one map lookup and one set update for each value, while in both your examples there was additional map update.
I believe this code is reasonably optimal performance wise. I didn't perform any tests myself, but I highly encourage you to do that on your real data and post results here.
Also, if you want to go even further, you might want to try mutable BitSet
instead of Set[Int]
. If ints in your data are fairly small it might yield some minor performance increase.
Upvotes: 1