Robin Green
Robin Green

Reputation: 33063

Efficiently iterate over one Set, and then another, in one for loop in Scala

I want to iterate over all the elements of one Set, and then all the elements of another Set, using a single loop. (I don't care about duplicates, because I happen to know the two Sets are disjoint.)

The reason I want to do it in a single loop is because I have some additional code to measure progress, which requires it to be in a single loop.

This doesn't work in general, because it might intermix the two Sets arbitrarily:

for(x <- firstSet ++ secondSet) {
   ...
}

This works, but builds 3 intermediate Seqs in memory, so it's far too inefficient in terms of both time and space usage:

for(x <- firstSet.toSeq ++ secondSet.toSeq) {
   ...
}

Upvotes: 3

Views: 10185

Answers (2)

R&#252;diger Klaehn
R&#252;diger Klaehn

Reputation: 12565

If you just want a traversal, and you want maximum performance, this is the best way even though it is ugly:

val s1 = Set(1,2,3)
val s2 = Set(4,5,6)
val block : Int => Unit = x => { println(x) }
s1.foreach(block)
s2.foreach(block)

Since this is pretty ugly, you can just define a class for it:

def traverse[T](a:Traversable[T], b:Traversable[T]) : Traversable[T] = 
  new Traversable[T] { 
    def foreach[U](f:T=>U) { a.foreach(f); b.foreach(f) } 
  }

And then use it like this:

for(x<-traverse(s1, s2)) println(x)

However, unless this is extremely performance-critical, the solution posted by Robin Green is better. The overhead is creation of two iterators and concatenation of them. If you have deeper nested data structures, concatenating iterators can be quite expensive though. For example a tree iterator that is defined by concatenating the iterators of the subtrees will be painfully slow, whereas a tree traversable where you just call foreach on each subtree will be close to optimal.

Upvotes: 5

Robin Green
Robin Green

Reputation: 33063

for(x <- firstSet.toIterator ++ secondSet.toIterator) {
   ...
}

This doesn't build any intermediate data structures, so I think it's the most efficient way.

Upvotes: 11

Related Questions