How is Scala so efficient with lists?

Question

It's usually considered a bad practice to make unnecessary collections in Java as it consumes some memory and CPU. Scala seems to be pretty efficient with it and encourages to use immutable data structures.

How is Scala so efficient with Lists? What techniques are used to achieve that?

puhlen · Accepted Answer

While the comments are correct that the claim that list is particularly efficient is a dubious one, it does much better than doing full copies of the collection for every operation like you would do with Java's standard collections.

The reason for this is List and the other immutable collections are not just mutable collections with mutation methods returning a copy, but are designed differently to with immutability in mind. They Take advantage of something called "structural sharing". If parts of a collection remain the same after a change, then those parts don't need to be copied and the same object can be shared across multiple collections. This works because of immutability, there is no change that they could be altered so it's safe to share.

Imagine the simplest example, prepending to a list.

You have a List(1,2,3) and you want to prepend 0

val original = List(1,2,3)
val updated = 0 :: original

You list would then look something like this

updated original
    \       \
     0 - - - 1 - - - 2 - - - 3

All that's needed is to create a new node and point it's tail to the head of your original list. Nothing needs to be copied. Similarly the tail and drop operations just need to return a reference to the appropriate node and nothing needs to be copied. This is why List can be quite good with the prepend and tail operations, because it doesn't do any copying even though it creates a "new" List.

Other List operations do require some amount copying, but always as little as possible. As long as part of the tail of a list is unchanged it doesn't need to be copied. For example when concatenating lists, the first list needs to be copied, but then it's tail can just point to the head of the second, so the second list doesn't need to be copied at all. This is why, when concatenating a long and short list it's better to put the shorter list on the "left" as it is the only one that needs to be copied.

Other types of collections are better at different operations. Vector for example can to both prepend and append in amortized constant time, as well as having good random access and update capabilities (though still much worse than a raw mutable array). In most cases it will be more efficient than List while still being immutable. It's implementation is quite complicated. It uses a trie datastructure, with many internal arrays to store data. The unchanged ones can be shared and only the ones that need to be altered by an update operation need to be copied.

How is Scala so efficient with lists?

Answers (1)

Related Questions