Reputation: 325
I'm writing some code that involves taking sets and maps with "small" (e.g., short strings or simple case classes) objects in them while recursing through a large structure, at each point adding a small (usually 1, sometimes a handful) objects to the set or map. It appears as if using mutable sets and maps gives a significant speed-up over immutable ones, but I'm having trouble quantitatively assessing the difference.
Does it make sense that Scala's garbage collection would cause a significant slow-down when I'm using immutable data structures? Would using mutable data structures fix this?
Upvotes: 5
Views: 978
Reputation: 844
Scala Mutable data structures gain efficiency over immutable ones by pre-allocating memory. They are better suited to many inserts (hence why they're mutable). Take a look at the implementation of the function += in the default mutable Collection, a HashMap, which Map extends:
https://github.com/scala/scala/blob/v2.9.2/src/library/scala/collection/mutable/HashMap.scala#L84
def += (kv: (A, B)): this.type = {
val e = findEntry(kv._1)
if (e == null) addEntry(new Entry(kv._1, kv._2))
else e.value = kv._2
this
}
HashMap implements a mutable Map using HashTable, which defines addEntry
https://github.com/scala/scala/blob/v2.9.2/src/library/scala/collection/mutable/HashTable.scala#L117
protected def addEntry(e: Entry) {
val h = index(elemHashCode(e.key))
e.next = table(h).asInstanceOf[Entry]
table(h) = e
tableSize = tableSize + 1
nnSizeMapAdd(h)
if (tableSize > threshold)
resize(2 * table.length)
}
The size of the collection is doubled every time the threshold is reached. So if you're repeatedly adding n entires one entry at a time to an empty mutable data structure, you'll only need to resize log(n) times. I haven't looked at the immutable data structure implementation in depth, but I'm assuming you're going to have to resize on every insert. Hence your performance disparity.
Upvotes: -1
Reputation: 81907
The Scala immutable collections are surprisingly efficient. Mostly because when a structure gets changed, lots of the structure gets reused.
But if you do lots of changes mutable structures might be a better fit. Actually this is what the Scala Collection API does in many places internally: Use a mutable datastructure to build new stuff and only as a last step, create a immutable and return it.
Upvotes: 6