Omer Zach
Omer Zach

Reputation: 325

In Scala, how do immutable and mutable sets and maps compare with regard to garbage collection?

I'm writing some code that involves taking sets and maps with "small" (e.g., short strings or simple case classes) objects in them while recursing through a large structure, at each point adding a small (usually 1, sometimes a handful) objects to the set or map. It appears as if using mutable sets and maps gives a significant speed-up over immutable ones, but I'm having trouble quantitatively assessing the difference.

Does it make sense that Scala's garbage collection would cause a significant slow-down when I'm using immutable data structures? Would using mutable data structures fix this?

Upvotes: 5

Views: 978

Answers (2)

Kareem
Kareem

Reputation: 844

Scala Mutable data structures gain efficiency over immutable ones by pre-allocating memory. They are better suited to many inserts (hence why they're mutable). Take a look at the implementation of the function += in the default mutable Collection, a HashMap, which Map extends:

https://github.com/scala/scala/blob/v2.9.2/src/library/scala/collection/mutable/HashMap.scala#L84

def += (kv: (A, B)): this.type = {
  val e = findEntry(kv._1)
  if (e == null) addEntry(new Entry(kv._1, kv._2))
  else e.value = kv._2
  this
}

HashMap implements a mutable Map using HashTable, which defines addEntry

https://github.com/scala/scala/blob/v2.9.2/src/library/scala/collection/mutable/HashTable.scala#L117

protected def addEntry(e: Entry) {
  val h = index(elemHashCode(e.key))
  e.next = table(h).asInstanceOf[Entry]
  table(h) = e
  tableSize = tableSize + 1
  nnSizeMapAdd(h)
  if (tableSize > threshold)
    resize(2 * table.length)
} 

The size of the collection is doubled every time the threshold is reached. So if you're repeatedly adding n entires one entry at a time to an empty mutable data structure, you'll only need to resize log(n) times. I haven't looked at the immutable data structure implementation in depth, but I'm assuming you're going to have to resize on every insert. Hence your performance disparity.

Upvotes: -1

Jens Schauder
Jens Schauder

Reputation: 81907

The Scala immutable collections are surprisingly efficient. Mostly because when a structure gets changed, lots of the structure gets reused.

But if you do lots of changes mutable structures might be a better fit. Actually this is what the Scala Collection API does in many places internally: Use a mutable datastructure to build new stuff and only as a last step, create a immutable and return it.

Upvotes: 6

Related Questions