Martin Trueman
Martin Trueman

Reputation: 45

Concatenation of Sets in Scala, unexpected results

The following is a bit of code adapted from an online Scala tutorial:

object Test {
  def main(args: Array[String]) {
      val fruit1 = List("apples", "oranges", "pears")
      val fruit2 = List("mangoes", "banana")
      val veg1 = Set("potatoes", "carrots", "peas")
      val veg2 = Set("cabbage", "squash")

      var fruit = fruit1 ++ fruit2
      println( "fruit1 ++ fruit2 : " + fruit )
      var vegetable = veg1 ++ veg2
      println( "veg1 ++ veg2 : " + vegetable )
   }
}

And here are the results:

fruit1 ++ fruit2 : List(apples, oranges, pears, mangoes, banana)

veg1 ++ veg2 : Set(cabbage, peas, squash, carrots, potatoes)

The List concatenation works as I expected, but I would never have predicted the resulting order of the Set concatenation. Can anyone explain how the results are arrived at? The online tutorial is not much help here.

Upvotes: 1

Views: 3034

Answers (2)

mattinbits
mattinbits

Reputation: 10428

Note that since Set is an unordered data structure, implementations are free to use any order they like.

The default implementation of Set uses a HashSet under the hood. This is implemented using a hashing approach which makes O(n) to determine whether an item exists in the Set. The ordering on output is probably the order of the hashes generated per item, which is effectively random.

An alternative implementation is TreeSet, which stores the data as a tree, so is O(log(n)) to determine whether an item exists in the set. In this case, the output ordering is the order of the items as they would be in a Tree.

The following code demonstrates this:

import scala.collection.immutable.{TreeSet, HashSet}

val veg1 = HashSet("potatoes", "carrots", "peas")
val veg2 = HashSet("cabbage", "squash")
val vegetable = veg1 ++ veg2
println( "veg1 ++ veg2 : " + vegetable )
//veg1 ++ veg2 : Set(cabbage, peas, squash, carrots, potatoes)

val trVeg1 = TreeSet("potatoes", "carrots", "peas")
val trVeg2 = TreeSet("cabbage", "squash")
val trVegetable = trVeg1 ++ trVeg2
println( "trVeg1 ++ trVeg2 : " + trVegetable )
//trVeg1 ++ trVeg2 : TreeSet(cabbage, carrots, peas, potatoes, squash)

EDIT:

As pointed out in the comments by Travis Brown, it's not exactly true that the "default implementation" of Set is a HashSet. Scala has specific implementations of Set when there's a small number of items, However when a set of more than 4 items is created, or when the ++ operator is used on instances of Set() to create a Set of more than 4 items, the result is a HashSet:

val set1 = Set("abc", "def", "ghi")

val set2 = Set("abcd", "defg", "ghij")

set1.getClass
//scala.collection.immutable.Set$Set3

set2.getClass
//scala.collection.immutable.Set$Set3

val concSet = set1 ++ set2

concSet.getClass
//scala.collection.immutable.HashSet$HashTrieSet

The small Set implementations only go up to Sets of 4 items, so:

val set3 = Set("abcd", "defg", "ghij", "fghtre", "dfghdd")

set3.getClass
//scala.collection.immutable.HashSet$HashTrieSet

https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/immutable/Set.scala#L184

Upvotes: 2

Travis Brown
Travis Brown

Reputation: 139038

scala.collection.Set doesn't provide any guarantees about traversal order—if you ask for a string representation of one, the order you get is unspecified and depends on the implementation.

Some subclasses of scala.collection.Set do provide guarantees about order—e.g. the ordering you see when iterating through a collection.immutable.SortedSet will depend on the Ordering instance that you used to construct the set, and collection.mutable.LinkedHashSet will iterate through its elements in insertion order. When you're working with an arbitrary Set, though, you should never, ever, rely on the order that you happen to see when using toString, foreach, etc.

Upvotes: 6

Related Questions