Scala: IndexedSeq.newBuilder vs. ArrayBuffer

Question

I had accepted that building an IndexedSeq in a loop should use an ArrayBuffer, followed by a conversion to a Vector via ".toVector()".

In an example profiled showed the CPU hotspot was in this section, and so I tried an alternative: use IndexedSeq.newBuilder() followed by conversion to immutable via ".result()".

This change gave a significance performance improvement. The code looks almost the same. So it seems using IndexedSeq.newBuilder() is best practice. Is this correct? The example method is shown below, with the ArrayBuffer difference commented out.

def interleave[T](a: IndexedSeq[T], b: IndexedSeq[T]): IndexedSeq[T] = {

  val al = a.length
  val bl = b.length

  val buffer = IndexedSeq.newBuilder[T]
  //---> val buffer = new ArrayBuffer[T](al + bl)
  val commonLength = Math.min(al, bl)
  val aExtra = al - commonLength
  val bExtra = bl - commonLength

  var i = 0
  while (i < commonLength) {
    buffer += a(i)
    buffer += b(i)
    i += 1
  }

  if (aExtra > 0) {
    while (i < al) {
      buffer += a(i)
      i += 1
    }
  } else if (bExtra > 0) {
    while (i < bl) {
      buffer += b(i)
      i += 1
    }
  }

  buffer.result()
  //---> buffer.toVector()
}

Mike Hanafey · Accepted Answer

In this example informal testing did not deceive, but ScalaMeter does provide a clearer picture of performance. Building the result in an ArrayBuffer (top orange line) is definitely slower than the more direct newBuilder (blue line).

Returning the ArrayBuffer as an IndexedSeq is the fastest (green line), but of course it does not give you the true protection of an immutable collection.

Building the intermediate result in an Array (red line) is intermediate between ArrayBuffer and newBuilder.

The "zipAll" collection method allows the interleave to be done in a more functional style:

  def interleaveZipAllBuilderPat[T](a: IndexedSeq[T], b: IndexedSeq[T]): IndexedSeq[T] = {
    a.zipAll(b, null, null).foldLeft(Vector.newBuilder[T]) { (z, tp) =>
      tp match {
        case ((x:T, null)) => z += x
        case ((x:T,y:T)) => z += x += y
      }
    }.result()
  }

The slowest are the functional method, with the top two almost the same and these differ only in that one does the pattern match, and the other an if statement, so the pattern is not slow.

Functional is marginally worse than the direct loop method if an ArrayBuffer is used to accumulate the result, but the direct loop using the newBuilder is significantly faster.

If "zipAll" could return a builder, and if the builder were iterable, the functional style could be faster - no need to produce the immutable result if the next step just requires an iteration over elements.

So for me newBuilder is the clear winner.

Scala: IndexedSeq.newBuilder vs. ArrayBuffer

Answers (2)

Related Questions