Silverdust
Silverdust

Reputation: 1527

Scala performance with functional constructs

I am currently profiling the performance of an application written in Scala and I am wondering whether functional constructs can be used. On the one hand I love functional programming for its elegance and conciseness, on the other hand I am scared about the resulting performance. I found one particularly good example.

I have a string with a million characters and I need to sum each digit. A typical functional approach would be like this:

val sum = value.map(_.asDigit).sum.toString

However, this beautiful, concise, functional approach takes 0.98s (almost a second)

var sum = 0;

for(digit <- value)
  sum += digit.asDigit

This imperative approach on the other side takes only 0.022s (2.24% of the above time) - it's around 50 times faster...

I am sure the problem arises because Scala is generating a new list in the first approach and then iterating over that list again in order to create the sum.

Is it just a bad idea to rely on functional constructs? I mean, they are beautiful - I love them - but they are 50 times slower...

P.S. I have also tried something else.

val sum = value.foldLeft(0)((sum, value) => sum + value.asDigit)

This functional approach, which is a little less concise and probably even harder to read than the imperative approach, takes 0.085s. It's harder to read AND still 4 times slower...

Upvotes: 3

Views: 561

Answers (1)

R&#252;diger Klaehn
R&#252;diger Klaehn

Reputation: 12565

First of all: are you sure that you have properly benchmarked the two versions? Just measuring the execution time using something like System.nanoTime will not give exact results. See this funny and insightful blog post by JVM performance guru Aleksey Shipilёv.

Here is a benchmark using the excellent Thyme scala benchmarking library:

val value = "1234567890" * 100000
def sumf = value.map(_.asDigit).sum
def sumi = { var sum = 0; for(digit <- value) sum += digit.asDigit; sum }

val th = ichi.bench.Thyme.warmed(verbose = println)
scala> th.pbenchOffWarm("Functional vs. Imperative")(th.Warm(sumf))(th.Warm(sumi))
Benchmark comparison (in 6.654 s): Functional vs. Imperative
Significantly different (p ~= 0)
  Time ratio:    0.36877   95% CI 0.36625 - 0.37129   (n=20)
    First     40.25 ms   95% CI 40.15 ms - 40.34 ms
    Second    14.84 ms   95% CI 14.75 ms - 14.94 ms
res3: Int = 4500000

So yes, the imperative version is faster. But not nearly by as much as you measured. In many, many situations the performance difference will be completely irrelevant. And for those few situations where the performance difference does matter, scala gives you the opportunity to write imperative code. All in all, I think scala is doing pretty well.

By the way: your second approach is almost as fast as the imperative version when properly benchmarked:

def sumf2 = value.foldLeft(0)(_ + _.asDigit)

scala> th.pbenchOffWarm("Functional2 vs. Imperative")(th.Warm(sumf2))(th.Warm(sumi))
Benchmark comparison (in 3.886 s): Functional2 vs. Imperative
Significantly different (p ~= 0)
  Time ratio:    0.89560   95% CI 0.88823 - 0.90297   (n=20)
    First     16.95 ms   95% CI 16.85 ms - 17.04 ms
    Second    15.18 ms   95% CI 15.08 ms - 15.27 ms
res17: Int = 4500000

Update due to suggestion from @Odomontois : Note that if you really want to optimize this, you have to make sure that the chars of the string are not being boxed. Here is an imperative version that is not very nice to look at, but also almost as fast as possible. This is using the cfor macro from spire, but a while loop would work just as well.

def sumi3 = {
  var sum = 0
  cfor(0)(_ < value.length, _ + 1) { i => 
    sum += value(i).asDigit
  }
  sum
}

scala> th.pbenchOffWarm("Imperative vs. optimized Imperative")(th.Warm(sumi))(th.Warm(sumi3))
Benchmark comparison (in 4.401 s): Imperative vs. optimized Imperative
Significantly different (p ~= 0)
  Time ratio:    0.08925   95% CI 0.08880 - 0.08970   (n=20)
    First     15.10 ms   95% CI 15.04 ms - 15.16 ms
    Second    1.348 ms   95% CI 1.344 ms - 1.351 ms
res9: Int = 4500000

Premature optimization disclaimer:

Unless you are absolutely sure that a) a piece of code is a performance bottleneck and b) the imperative version is much faster, I would always prefer the most readable version over the fastest. Scala 2.12 will come with a new optimizer that will make a lot of the overhead of functional style much smaller, since it can do advanced optimizations such as closure inlining in many cases.

Upvotes: 14

Related Questions