Caballero
Caballero

Reputation: 12101

Scala foldLeft performance 4x worse than for loop when working with floats

Can someone with broader knowledge of the subject explain, why is

var acc = 0.0
for (i <- 0 until 100)
    acc += 4.0 * (1 - (i % 2) * 2) / (2 * i + 1)

over 4 times faster than

(0 until 100).foldLeft(0.0)({
    (d, i) => d + 4.0 * (1 - (i % 2) * 2) / (2 * i + 1)
})

I prefer the functional version, just don't quite understand the peformance hit. Any way to optimize this or maybe a foldLeft alternative?

Upvotes: 2

Views: 2581

Answers (2)

Will Fitzgerald
Will Fitzgerald

Reputation: 1382

Here is a simple recursive version:

@tailrec
final def rec(i:Int, max:Int, acc:Double): Double = 
  if (i == max) acc else rec(i+1, max, acc + 4.0 * (1 - (i % 2) * 2) / (2 * i + 1))

I ran these methods and a recursive one through Calipher, using this simple Scala SBT benchmarking template, with sizes of 100, 1000, ..., 10000000. Here are the results:

[info]   length benchmark        ns linear runtime
[info]      100       For       893 =
[info]      100  FoldLeft      1447 =
[info]      100       Rec       888 =
[info]     1000       For     10540 =
[info]     1000  FoldLeft     20322 =
[info]     1000       Rec     11048 =
[info]    10000       For    115499 =
[info]    10000  FoldLeft    255171 =
[info]    10000       Rec    116690 =
[info]   100000       For   1268198 =
[info]   100000  FoldLeft   2223178 =
[info]   100000       Rec   1114504 =
[info]  1000000       For  11985576 =
[info]  1000000  FoldLeft  23916076 ==
[info]  1000000       Rec  11993123 =
[info] 10000000       For 114804171 ==============
[info] 10000000  FoldLeft 244991000 ==============================
[info] 10000000       Rec 119422600 ==============

Notice the functional, non-mutating, recursive version is just about as fast as the for loop (tail recursion for the win). Also, notice that the foldLeft version is about twice as slow.

Upvotes: 7

dhg
dhg

Reputation: 52681

First, This is way too small of an example to get an accurate benchmark. You can't really tell anything from these numbers. You also have to control for things like JVM warm up, which will totally throw off any numbers you get.

That said, you can look at the source to see exactly how these things are implemented. In particular, Range's foldLeft is defined on TraversableOnce, which you can view here:

def foldLeft[B](z: B)(op: (B, A) => B): B = {
  var result = z
  this foreach (x => result = op(result, x))
  result
}

As you can see, foldLeft is just delegating to foreach, which is exactly what the for(...) syntax resolves to.

In other words, they are actually doing the same thing.

The general rule is that you're probably not going to see a real difference performance-wise on anything like this unless you're doing an enormous number of calculations. If it really is an issue, the fastest thing to do would be to use a while loop.

Upvotes: 9

Related Questions