Scala: Performance difference between nested for loops and for-comprehensions

Question

I was under the impression that in Scala (2.11.7), the following snippets of code would have similar performance characteristics but it appears I am wrong:

a: IndexedSeq[(Int, Int)]

Option 1:

var ans = 0
for {
  i <- a.indices
  (x1, y1) = a(i)
  j <- a.indices drop (i+1)
  (x2, y2) = a(j)
  if x1 == x2 || y1 == y2
} ans += 1

Option 2:

var ans = 0
for (i <- a.indices) {
  val (x1, y1) = a(i)
  for (j <- a.indices drop (i+1)) {
    val (x2, y2) = a(j)
    if (x1 == x2 || y1 == y2) ans += 1
  }
}

But, it appears that even for small size (a.length == 100), the second way is about 5-10x faster than the first way. Also a is an IndexedSeq here, so the random-access should hardly matter that much (see f1 vs f2 below). Here is the full benchmarker:

import scala.util.Random

object PerfTester extends App {

  def f1(a: IndexedSeq[(Int, Int)]) = {
    var ans = 0
    for {
      i <- a.indices
      j <- a.indices drop (i+1)
      ((x1, y1), (x2, y2)) = (a(i), a(j))
      if x1 == x2 || y1 == y2
    } ans += 1
    ans
  }

  def f2(a: IndexedSeq[(Int, Int)]) = {
    var ans = 0
    for {
      i <- a.indices
      (x1, y1) = a(i)
      j <- a.indices drop (i+1)
      (x2, y2) = a(j)
      if x1 == x2 || y1 == y2
    } ans += 1
    ans
  }

  def f3(a: IndexedSeq[(Int, Int)]) = {
    var ans = 0
    for (i <- a.indices) {
      val (x1, y1) = a(i)
      for (j <- a.indices drop (i+1)) {
        val (x2, y2) = a(j)
        if (x1 == x2 || y1 == y2) ans += 1
      }
    }
    ans
  }

  def profile[R](code: => R, t: Long = System.nanoTime()) = (code, (System.nanoTime() - t)/1e6)

  val n = 1000
  val data = IndexedSeq.fill(n) {
    Random.nextInt(100) -> Random.nextInt(100)
  }
  val (r1, t1) = profile(f1(data))
  val (r2, t2) = profile(f2(data))
  val (r3, t3) = profile(f3(data))
  require(r1 == r2 && r2 == r3)
  println(s"f1: $t1 ms")
  println(s"f2: $t2 ms")
  println(s"f3: $t3 ms")
}

I know these kind of tests are susceptible to JVM warm-ups, hotspot optimizations and ordering etc so I verified this by randomizing the invocations of f1, f2, f3 and averaging the run-times over many different runs.

I ran into this while doing a programming contest on codeforces.com:

This got "time-limit exceeded": http://codeforces.com/contest/629/submission/16248545
This got "accepted": http://codeforces.com/contest/629/submission/16248804

Scala: Performance difference between nested for loops and for-comprehensions

Answers (1)

Related Questions