Reputation: 16232

Why does the compiler optimize ldc.i8 and not ldc.r8?

I'm wondering why this C# code

long b = 20;

is compiled to

ldc.i4.s 0x14
conv.i8

(Because it takes 3 bytes instead of the 9 required by ldc.i8 20. See this for more information.)

while this code

double a = 20;

is compiled to the 9-byte instruction

ldc.r8 20

instead of this 3-byte sequence

ldc.i4.s 0x14
conv.r8

(Using mono 4.8.)

Is this a missed opportunity or the cost of the conv.i8 outbalances the gain in code size ?

Upvotes: 9

Answers (3)

Luaan

Reputation: 63772

Because float is not a smaller double, and integer is not a float (or vice versa).

All int values have a 1:1 mapping on a long value. The same simply isn't true for float and double - floating point operations are tricky that way. Not to mention that int-float conversions aren't free - unlike pushing a 1 byte value on the stack / in a register; look at the x86-64 code produced by both approaches, not just the IL code. Size of the IL code is not the only factor to consider in optimisation.

This is in contrast to decimal, which is actually a base-10 decimal number, rather than a base-2 decimal floating point number. There 20M maps perfectly to 20 and vice versa, so the compiler is free to emit this:

IL_0000:  ldc.i4.s    0A 
IL_0002:  newobj      System.Decimal..ctor

The same approach simply isn't safe (or cheap!) for binary floating point numbers.

You might think that the two approaches are necessarily safe, because it doesn't really matter whether we do a conversion from an integer literal ("a string") to a double value in compile-time, or whether we do it in IL. But this simply isn't the case, as a bit of specification diving unveils:

ECMA CLR spec, III.1.1.1:

Storage locations for floating-point numbers (statics, array elements, and fields of classes) are of fixed size. The supported storage sizes are float32 and float64. Everywhere else (on the evaluation stack, as arguments, as return types, and as local variables) floating-point numbers are represented using an internal floating-point type. In each such instance, the nominal type of the variable or expression is either float32 or float64, but its value might be represented internally with additional range and/or precision.

To keep things short, let's pretend float64 actually uses 4 binary digits, while the implementation defined floating type (F) uses 5 binary digits. We want to convert an integer literal that happens to have a binary representation that's more than four digits. Now compare how it's going to behave:

ldc.r8 0.1011E2 ; expanded to 0.10110E2
ldc.r8 0.1E2
mul             ; 0.10110E2 * 0.10000E2 == 0.10110E3

conv.r8 converts to the F, not float64. So we actually get:

ldc.i4.s theSameLiteral
conv.r8 ; converted to 0.10111E2
mul     ; 0.10111E2 * 0.10000E2 == 0.10111E3

Oops :)

Now, I'm pretty sure this isn't going to happen with an integer in the range of 0-255 on any reasonable platform. But since we're coding against the CLR specification, we can't make that assumption. The JIT compiler can, but that's too late. The language compiler may define the two to be equivalent, but the C# specification doesn't - a double local is considered a float64, not F. You can make your own language, if you so desire.

In any case, IL generators don't really optimise much. That's left to JIT compilation for the most part. If you want an optimised C#-IL compiler, write one - I doubt there's enough benefit to warrant the effort, especially if your only goal is to make the IL code smaller. Most IL binaries are already quite a bit smaller than the equivalent native code.

As for the actual code that runs, on my machine, both approaches result in exactly the same x86-64 assembly - load a double precision value from the data segment. The JIT can easily make this optimisation, since it knows what architecture the code is actually running on.

Upvotes: 7

IS4

Reputation: 13207

I doubt you will get more satisfactory answer than "because noone thought it necessary to implement it."

The fact is, they could've made it this way, but as Eric Lippert has many times stated, features are chosen to be implemented rather than chosen not to be implemented. In this particular case, this feature's gain didn't outweigh the costs, e.g. additional testing, non-trivial conversion between int and float, while in the case of ldc.i4.s, it's not that much of a trouble. Also it's better not to bloat the jitter with more optimization rules.

As shown by the Roslyn source code, the conversion is done only for long. All in all, it's entirely possible to add this feature also for float or double, but it won't be much useful except when producing shorter CIL code (useful when inlining is needed), and when you want to use a float constant, you usually actually use a floating point number (i.e. not an integer).

Upvotes: 3

Hadi Brais

Reputation: 23709

First, let's consider correctness. The ldc.i4.s can handle integers between -128 to 127, all of which can be exactly represented in float32. However, the CIL uses an internal floating-point type called F for some storage locations. The ECMA-335 standard says in III.1.1.1:

...the nominal type of the variable or expression is either float32 or float64...The internal representation shall have the following characteristics:

The internal representation shall have precision and range greater than or equal to the nominal type.

Conversions to and from the internal representation shall preserve value.

This all means that any float32 value is guaranteed to be safely represented in F no matter what F is.

We conclude that the alternative sequence of instructions that you have proposed is correct. Now the question is: is it better in terms of performance?

To answer this question, let's see what the JIT compiler does when it sees both code sequences. When using ldc.r8 20, the answer given in the link you referenced explains nicely the ramifications of using long instructions.

Let's consider the 3-byte sequence:

ldc.i4.s 0x14
conv.r8

We can make an assumption here that is reasonable for any optimizing JIT compiler. We'll assume that the JIT is capable of recognizing such sequence of instructions so that the two instructions can be compiled together. The compiler is given the value 0x14 represented in the two's complement format and have to convert it to the float32 format (which is always safe as discussed above). On relatively modern architectures, this can be done extremely efficiently. This tiny overhead is part of the JIT time and therefore is incurred only once. The quality of the generated native code is the same for both IL sequences.

So the 9-byte sequence has a size issue which could incur any amount of overhead from nothing to more (assuming that we use it everywhere) and the 3-byte sequence has the one-time tiny conversion overhead. Which one is better? Well, somebody has to do some scientifically-sound experimentation to measure the difference in performance to answer that question. I would like to stress that you should not care about this unless you are an engineer or researcher in compiler optimizations. Otherwise, you should be optimizing your code at a higher level (at the source code level).

Upvotes: 0

Why does the compiler optimize ldc.i8 and not ldc.r8?

Answers (3)

Related Questions