MisterMetaphor
MisterMetaphor

Reputation: 6018

String concatenation optimization in the F# compiler

The C# compiler is smart enough to optimize string concatenation with the + operator into String.Concat calls.

The following code:

var a = "one";
var b = "two";
var c = "three";
var d = "four";

var x = a + b + c + d;

Is compiled into this IL:

IL_0000:  ldstr       "one"
IL_0005:  stloc.0     // a
IL_0006:  ldstr       "two"
IL_000B:  stloc.1     // b
IL_000C:  ldstr       "three"
IL_0011:  stloc.2     // c
IL_0012:  ldstr       "four"
IL_0017:  stloc.3     // d
IL_0018:  ldloc.0     // a
IL_0019:  ldloc.1     // b
IL_001A:  ldloc.2     // c
IL_001B:  ldloc.3     // d
IL_001C:  call        System.String.Concat

The compiler figured out the correct overload of String.Concat that takes 4 arguments and used that.

The F# compiler doesn't do that. Instead, each + is compiled into a separate call of String.Concat:

IL_0005:  ldstr       "one"
IL_000A:  ldstr       "two"
IL_000F:  call        System.String.Concat
IL_0014:  ldstr       "three"
IL_0019:  call        System.String.Concat
IL_001E:  ldstr       "four"
IL_0023:  call        System.String.Concat

Obviously this is because this particular optimization is not implemented in the F# compiler.

The question is why: is it technically hard to do or is there some other reason?

String concatenation is a fairly common operation and while I realize that the performance of compiled code is not a top priority, I imagine this kind of optimization would be useful in many cases.

Upvotes: 6

Views: 243

Answers (2)

Stephen Swensen
Stephen Swensen

Reputation: 22297

I am actually surprised the F# compiler doesn't do this optimization, as it falls generally under the umbrella of constants propagation and folding, which are common and relatively simple optimizations that can be applied to constants of any variety with operations known to the compiler (admittedly there may be some considerations I am not aware of).

Note that C# is cleverly using the 4 arg String.Concat overload but is neither propagating nor folding the string constants (the advantage is that this optimization works just as well for non-constant strings). On-the-other-hand, F# is propagating the constants but then neglects to fold them (who knows, maybe the JIT is smart enough to do the folding!).

What's interesting to me is that F# does fold Int16 and Int32 constants but doesn't fold Double or Float constants (and those are just the constants I tested).

Upvotes: 4

Tomas Petricek
Tomas Petricek

Reputation: 243096

I don't think there is anything hard about the optimization - I think the main reason why it is not implemented is that it is specific to string concatenation and does not apply more generally. However, it sounds like something that would be an interesting project using the F# open source release!

That said, even the C# optimization done above is not that clever. There is no reason why the compiler shouldn't just concatenate the strings directly when they are constants and produce:

IL_0000:  ldstr   "onetwothreefour"

In other words, there is always a tradeoff between adding something that is generally useful and adding more and more special cases - the C# compiler apparently has a few more special cases related to string concatenation...

Upvotes: 9

Related Questions