Split long interpolated string

Question

An example:

var a = $"Some value 1: {b1:0.00}
Some value 2: {b2}
Some value 3: {b3:0.00000}
Some value 4: {b4:0.00}
Some value 5: {b6:0.0}
Some value 7: {b7:0.000000000}";

That's somewhat hard to read source.

I can do it

var a = $"Some value 1: {b1:0.00}
" +
        $"Some value 2: {b2}
" +
        $"Some value 3: {b3:0.00000}
" +
        $"Some value 4: {b4:0.00}
" +
        $"Some value 5: {b6:0.0}
" +
        $"Some value 7: {b7:0.000000000}";

But here is a comment saying what this will be multiple calls to string.Format and I think it will (no idea how to check it, IL is a black box for me yet).

Question: is it ok to do? What are other options to split long interpolated string?

atlaste · Accepted Answer

What does the compiler do?

Let's start here:

var a = $"Some value 1: {b1:0.00}
" +
        $"Some value 2: {b2}
" +
        $"Some value 3: {b3:0.00000}
" +
        $"Some value 4: {b4:0.00}
" +
        $"Some value 5: {b6:0.0}
" +
        $"Some value 7: {b7:0.000000000}";

IL is a black box for me yet

Why not simply Open it up? That's pretty easy using a tool like ILSpy, Reflector, etc.

What will happen in your code is that each line is compiled to a string.Format. The rule is pretty simple: if you have $"...{X}...{Y}..." it will be compiled as string.Format("...{0}...{1}...", X, Y). Also the + operator will introduce a string concatenation.

In more detail, string.Format is a simple static call, which means that the compiler will use the call opcode instead of callvirt.

From all this you might deduce that it's pretty easy for a compiler to optimize this: if we have an expression like constant string + constant string + ... you can simply replace it with constant string. You can argue that the compiler has knowledge about the inner workings of string.Format and string concatenation and handle that. On the other hand, you could argue that it should not. Let me detail the two considerations:

Note that strings are objects in .NET, but they are 'special ones'. You can see this from the fact that there's a special ldstr opcode, but also if you check out what happens if you switch on a string -- the compiler will generate a dictionary. So, from this you could deduce that the compiler 'knows' how a string works internally. Let's figure out if it knows how to do concatenation, ok?

var str = "foo" + "bar";
Console.WriteLine(str);

In IL (Release mode of course) this will give:

L_0000: ldstr "foobar"

tl;dr: So, regardless if the concatenation of interpolated strings are already implemented or not (they are not), I'd be pretty confident that the compiler will handle this case eventually.

What does the JIT do?

Next question would be: how smart is the JIT compiler with strings?

So, let's consider for a moment that we will teach the compiler about all the inner workings of string. First we should note that C# is compiled to IL, which is JIT compiled to assembler. In the case of the switch it's pretty hard for the JIT compiler to create the dictionary, so we have to do it in the compiler. On the other hand, if we're handling more complex concatenation it makes sense to use the things we already have available for f.ex. integer arithmetic to do string operations as well. This implies putting string operations in the JIT compiler. Let's for a moment consider that with an example:

var str = "";
for (int i=0; i<10; ++i) {
    str += "foo";
}
Console.WriteLine(str);

The compiler will simply compile the concatenation to IL, which means that the IL will hold a pretty straight-forward implementation of this. In this case loop unrolling arguably has a lot of benefits for the (runtime) performance of the program: it can simply unroll the loop, appending the string 10 times, which results in a simple constant.

However, giving this knowledge to the JIT compiler makes it more complex, which means that the runtime will spend more time on JIT compiling (figuring out the optimization) and less time executing (running the emitted assembler). Question that remains is: what will happen?

Start the program, put a breakpoint on the writeline and hit ctrl-alt-D and see the assembler.

00007FFCC8044413  jmp         00007FFCC804443F  
            {
                str += "foo";
00007FFCC8044415  mov         rdx,2BEE2093610h  
00007FFCC804441F  mov         rdx,qword ptr [rdx]  
00007FFCC8044422  mov         rcx,qword ptr [rbp-18h]  
00007FFCC8044426  call        00007FFD26434CC0  

[...]
00007FFCC804443A  inc         eax  
00007FFCC804443C  mov         dword ptr [rbp-0Ch],eax  
00007FFCC804443F  mov         ecx,dword ptr [rbp-0Ch]  
00007FFCC8044442  cmp         ecx,0Ah  
00007FFCC8044445  jl          00007FFCC8044415

tl;dr: Nope, that's not optimized.

But I want the JIT to optimize that as well!

Yea, well, I'm not too sure if I share that opinion. There's a balance between runtime performance and time spent in JIT compilation. Notice that if you're doing something like this in a tight loop, I would argue that you're asking for trouble. On the other hand, if it's a common and trivial case (like the constants that are concatenated) it's pretty easy to optimize and it doesn't affect the runtime.

In other words: arguably, you don't want this to be optimized by the JIT, assuming that would take too much time. I'm confident we can trust Microsoft in making this decision wisely.

Also, you should realize that strings in .NET are heavily optimized things. We all know that they're used a lot, and so does Microsoft. If you're not writing 'really stupid code', it's a very reasonable assumption that it will perform just fine (until proven otherwise).

Alternatives?

What are other options to split long interpolated string?

Use resources. Resources are a useful tool in dealing with multiple languages. And if this is just a small, non-professional project - I simply wouldn't bother at all.

Alternatively you can use the fact that constant strings are concatenated:

var fmt = "Some value 1: {1:0.00}
" +
          "Some value 2: {2}
" +
          "Some value 3: {3:0.00000}
" +
          "Some value 4: {4:0.00}
" +
          "Some value 5: {6:0.0}
" +
          "Some value 7: {7:0.000000000}";

var a = string.Format(fmt, b1, b2, b3, b4, b5, b6, b7);

Split long interpolated string

Answers (2)

Related Questions