Reputation: 33489
I was under the impression that in Scala (2.11.7), the following snippets of code would have similar performance characteristics but it appears I am wrong:
a: IndexedSeq[(Int, Int)]
Option 1:
var ans = 0
for {
i <- a.indices
(x1, y1) = a(i)
j <- a.indices drop (i+1)
(x2, y2) = a(j)
if x1 == x2 || y1 == y2
} ans += 1
Option 2:
var ans = 0
for (i <- a.indices) {
val (x1, y1) = a(i)
for (j <- a.indices drop (i+1)) {
val (x2, y2) = a(j)
if (x1 == x2 || y1 == y2) ans += 1
}
}
But, it appears that even for small size (a.length == 100
), the second way is about 5-10x faster than the first way. Also a
is an IndexedSeq
here, so the random-access should hardly matter that much (see f1
vs f2
below). Here is the full benchmarker:
import scala.util.Random
object PerfTester extends App {
def f1(a: IndexedSeq[(Int, Int)]) = {
var ans = 0
for {
i <- a.indices
j <- a.indices drop (i+1)
((x1, y1), (x2, y2)) = (a(i), a(j))
if x1 == x2 || y1 == y2
} ans += 1
ans
}
def f2(a: IndexedSeq[(Int, Int)]) = {
var ans = 0
for {
i <- a.indices
(x1, y1) = a(i)
j <- a.indices drop (i+1)
(x2, y2) = a(j)
if x1 == x2 || y1 == y2
} ans += 1
ans
}
def f3(a: IndexedSeq[(Int, Int)]) = {
var ans = 0
for (i <- a.indices) {
val (x1, y1) = a(i)
for (j <- a.indices drop (i+1)) {
val (x2, y2) = a(j)
if (x1 == x2 || y1 == y2) ans += 1
}
}
ans
}
def profile[R](code: => R, t: Long = System.nanoTime()) = (code, (System.nanoTime() - t)/1e6)
val n = 1000
val data = IndexedSeq.fill(n) {
Random.nextInt(100) -> Random.nextInt(100)
}
val (r1, t1) = profile(f1(data))
val (r2, t2) = profile(f2(data))
val (r3, t3) = profile(f3(data))
require(r1 == r2 && r2 == r3)
println(s"f1: $t1 ms")
println(s"f2: $t2 ms")
println(s"f3: $t3 ms")
}
I know these kind of tests are susceptible to JVM warm-ups, hotspot optimizations and ordering etc so I verified this by randomizing the invocations of f1
, f2
, f3
and averaging the run-times over many different runs.
I ran into this while doing a programming contest on codeforces.com:
Upvotes: 2
Views: 555
Reputation: 167911
The code despite being conceptually equivalent isn't how Scala encodes the previous for loops. In particular, when you write y = x
, Scala performs a separate map operation, and bundles the answers into a tuple.
If you ask at the command line:
scala -Xprint:typer -e 'for {i <- 1 to 10; j = i*i } println(j)'
you get (in part):
def main(args: Array[String]): Unit = {
final class $anon extends scala.AnyRef {
def <init>(): <$anon: AnyRef> = {
$anon.super.<init>();
()
};
scala.this.Predef.intWrapper(1).to(10).map[(Int, Int), scala.collection.immutable.IndexedSeq[(Int, Int)]](((i: Int) => {
val j: Int = i.*(i);
scala.Tuple2.apply[Int, Int](i, j)
}))(immutable.this.IndexedSeq.canBuildFrom[(Int, Int)]).foreach[Unit](((x$1: (Int, Int)) => (x$1: (Int, Int) @unchecked) match {
case (_1: Int, _2: Int)(Int, Int)((i @ _), (j @ _)) => scala.this.Predef.println(j)
}))
};
{
new $anon();
()
}
}
where you can see the extra map in line 7 and tuple being created at line 9. Then it just has to go extract it again on line 11. Compared to inserting that into the closure of the function for the foreach, this is really expensive, especially for integers like I used in this example, since they have to be boxed.
One could argue that the existing method could stand to be improved, but that's how it's presently implemented.
Upvotes: 2