pathikrit
pathikrit

Reputation: 33489

Scala: Performance difference between nested for loops and for-comprehensions

I was under the impression that in Scala (2.11.7), the following snippets of code would have similar performance characteristics but it appears I am wrong:

a: IndexedSeq[(Int, Int)]

Option 1:

var ans = 0
for {
  i <- a.indices
  (x1, y1) = a(i)
  j <- a.indices drop (i+1)
  (x2, y2) = a(j)
  if x1 == x2 || y1 == y2
} ans += 1

Option 2:

var ans = 0
for (i <- a.indices) {
  val (x1, y1) = a(i)
  for (j <- a.indices drop (i+1)) {
    val (x2, y2) = a(j)
    if (x1 == x2 || y1 == y2) ans += 1
  }
}

But, it appears that even for small size (a.length == 100), the second way is about 5-10x faster than the first way. Also a is an IndexedSeq here, so the random-access should hardly matter that much (see f1 vs f2 below). Here is the full benchmarker:

import scala.util.Random

object PerfTester extends App {

  def f1(a: IndexedSeq[(Int, Int)]) = {
    var ans = 0
    for {
      i <- a.indices
      j <- a.indices drop (i+1)
      ((x1, y1), (x2, y2)) = (a(i), a(j))
      if x1 == x2 || y1 == y2
    } ans += 1
    ans
  }

  def f2(a: IndexedSeq[(Int, Int)]) = {
    var ans = 0
    for {
      i <- a.indices
      (x1, y1) = a(i)
      j <- a.indices drop (i+1)
      (x2, y2) = a(j)
      if x1 == x2 || y1 == y2
    } ans += 1
    ans
  }

  def f3(a: IndexedSeq[(Int, Int)]) = {
    var ans = 0
    for (i <- a.indices) {
      val (x1, y1) = a(i)
      for (j <- a.indices drop (i+1)) {
        val (x2, y2) = a(j)
        if (x1 == x2 || y1 == y2) ans += 1
      }
    }
    ans
  }

  def profile[R](code: => R, t: Long = System.nanoTime()) = (code, (System.nanoTime() - t)/1e6)

  val n = 1000
  val data = IndexedSeq.fill(n) {
    Random.nextInt(100) -> Random.nextInt(100)
  }
  val (r1, t1) = profile(f1(data))
  val (r2, t2) = profile(f2(data))
  val (r3, t3) = profile(f3(data))
  require(r1 == r2 && r2 == r3)
  println(s"f1: $t1 ms")
  println(s"f2: $t2 ms")
  println(s"f3: $t3 ms")
}

I know these kind of tests are susceptible to JVM warm-ups, hotspot optimizations and ordering etc so I verified this by randomizing the invocations of f1, f2, f3 and averaging the run-times over many different runs.

I ran into this while doing a programming contest on codeforces.com:

  1. This got "time-limit exceeded": http://codeforces.com/contest/629/submission/16248545
  2. This got "accepted": http://codeforces.com/contest/629/submission/16248804

Upvotes: 2

Views: 555

Answers (1)

Rex Kerr
Rex Kerr

Reputation: 167911

The code despite being conceptually equivalent isn't how Scala encodes the previous for loops. In particular, when you write y = x, Scala performs a separate map operation, and bundles the answers into a tuple.

If you ask at the command line:

scala -Xprint:typer -e 'for {i <- 1 to 10; j = i*i } println(j)'

you get (in part):

def main(args: Array[String]): Unit = {
  final class $anon extends scala.AnyRef {
    def <init>(): <$anon: AnyRef> = {
      $anon.super.<init>();
      ()
    };
    scala.this.Predef.intWrapper(1).to(10).map[(Int, Int), scala.collection.immutable.IndexedSeq[(Int, Int)]](((i: Int) => {
      val j: Int = i.*(i);
      scala.Tuple2.apply[Int, Int](i, j)
    }))(immutable.this.IndexedSeq.canBuildFrom[(Int, Int)]).foreach[Unit](((x$1: (Int, Int)) => (x$1: (Int, Int) @unchecked) match {
        case (_1: Int, _2: Int)(Int, Int)((i @ _), (j @ _)) => scala.this.Predef.println(j)
    }))
  };
  {
    new $anon();
    ()
  }
}

where you can see the extra map in line 7 and tuple being created at line 9. Then it just has to go extract it again on line 11. Compared to inserting that into the closure of the function for the foreach, this is really expensive, especially for integers like I used in this example, since they have to be boxed.

One could argue that the existing method could stand to be improved, but that's how it's presently implemented.

Upvotes: 2

Related Questions