Reputation: 45
The following is a bit of code adapted from an online Scala tutorial:
object Test {
def main(args: Array[String]) {
val fruit1 = List("apples", "oranges", "pears")
val fruit2 = List("mangoes", "banana")
val veg1 = Set("potatoes", "carrots", "peas")
val veg2 = Set("cabbage", "squash")
var fruit = fruit1 ++ fruit2
println( "fruit1 ++ fruit2 : " + fruit )
var vegetable = veg1 ++ veg2
println( "veg1 ++ veg2 : " + vegetable )
}
}
And here are the results:
fruit1 ++ fruit2 : List(apples, oranges, pears, mangoes, banana)
veg1 ++ veg2 : Set(cabbage, peas, squash, carrots, potatoes)
The List concatenation works as I expected, but I would never have predicted the resulting order of the Set concatenation. Can anyone explain how the results are arrived at? The online tutorial is not much help here.
Upvotes: 1
Views: 3034
Reputation: 10428
Note that since Set
is an unordered data structure, implementations are free to use any order they like.
The default implementation of Set
uses a HashSet
under the hood. This is implemented using a hashing approach which makes O(n) to determine whether an item exists in the Set. The ordering on output is probably the order of the hashes generated per item, which is effectively random.
An alternative implementation is TreeSet
, which stores the data as a tree, so is O(log(n)) to determine whether an item exists in the set. In this case, the output ordering is the order of the items as they would be in a Tree.
The following code demonstrates this:
import scala.collection.immutable.{TreeSet, HashSet}
val veg1 = HashSet("potatoes", "carrots", "peas")
val veg2 = HashSet("cabbage", "squash")
val vegetable = veg1 ++ veg2
println( "veg1 ++ veg2 : " + vegetable )
//veg1 ++ veg2 : Set(cabbage, peas, squash, carrots, potatoes)
val trVeg1 = TreeSet("potatoes", "carrots", "peas")
val trVeg2 = TreeSet("cabbage", "squash")
val trVegetable = trVeg1 ++ trVeg2
println( "trVeg1 ++ trVeg2 : " + trVegetable )
//trVeg1 ++ trVeg2 : TreeSet(cabbage, carrots, peas, potatoes, squash)
EDIT:
As pointed out in the comments by Travis Brown, it's not exactly true that the "default implementation" of Set is a HashSet. Scala has specific implementations of Set
when there's a small number of items, However when a set of more than 4 items is created, or when the ++
operator is used on instances of Set()
to create a Set
of more than 4 items, the result is a HashSet:
val set1 = Set("abc", "def", "ghi")
val set2 = Set("abcd", "defg", "ghij")
set1.getClass
//scala.collection.immutable.Set$Set3
set2.getClass
//scala.collection.immutable.Set$Set3
val concSet = set1 ++ set2
concSet.getClass
//scala.collection.immutable.HashSet$HashTrieSet
The small Set implementations only go up to Sets of 4 items, so:
val set3 = Set("abcd", "defg", "ghij", "fghtre", "dfghdd")
set3.getClass
//scala.collection.immutable.HashSet$HashTrieSet
https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/immutable/Set.scala#L184
Upvotes: 2
Reputation: 139038
scala.collection.Set
doesn't provide any guarantees about traversal order—if you ask for a string representation of one, the order you get is unspecified and depends on the implementation.
Some subclasses of scala.collection.Set
do provide guarantees about order—e.g. the ordering you see when iterating through a collection.immutable.SortedSet
will depend on the Ordering
instance that you used to construct the set, and collection.mutable.LinkedHashSet
will iterate through its elements in insertion order. When you're working with an arbitrary Set
, though, you should never, ever, rely on the order that you happen to see when using toString
, foreach
, etc.
Upvotes: 6