SledgeHammer
SledgeHammer

Reputation: 7705

C# EMIT IL performance issue

I'm working on an engine where we copy around lots and lots of properties dynamically at runtime. Depending on the situation, we may or may not modify the property value along the way. It was originally written with reflection, but due to performance issues, we recently re-wrote it in Reflection.Emit. The re-write is complete and performance is obviously a lot better, but now the code is being benchmarked against hand-written C#. Obviously, to be a fair fight, the hand-written C# for the benchmarks has "similar functionality" (you'll see what I mean in a sec) as the IL.

Some of the IL engine has been signed off on as it has passed with flying colors and is pretty much 1:1 with the hand-written C#. This tells me:

  1. there is no overhead in calling the dynamic method

  2. our general concept and implementation is correct

  3. benchmarking is correct

  4. IL and handwritten C# is being tested in exactly the same way, so no funny JIT business is going on (I don't think)

We went in expecting the IL to be slightly slower then the hand-written, but that has not been the case so far. It's maybe a few ms slower in long rounds, but you can take shortcuts in IL, so that helps make up the diff.

In one particular case, its substantially slower. 2x slower.

In C#, you'd have:

class Source
{
    public string S1 { get; set; }
    public int I1 { get; set; }
    public int I2 { get; set; }
    public double D1 { get; set; }
    public double D2 { get; set; }
    public double D3 { get; set; }
}

class Dest
{
    public string S1 { get; set; }
    public int I1 { get; set; }
    public string I2 { get; set; }
    public double D1 { get; set; }
    public int D2 { get; set; }
    public string D3 { get; set; }
}

static Dest Test(Source s)
{
    Dest d = new Dest();

    object o = s.D3;

    if (o != null)
        d.D3 = o.ToString();

    return d;
}

This is what I meant by similar functionality. To be generic, when we copy a property to a string, we first box it and then call Object.ToString(). Natively, value types call ToString different, thus the code above, to be apples to apples.

If I comment out the D3 copy/ToString and uncomment the other 5 properties, I'm back to 1:1 with the C#.

You'll notice that I2 is int -> string, but for some reason, that one doesn't have the same problem as with the double -> string. I get that double ToString() is more expensive in general, but that expense should show up in the C# code too, but it doesn't.

The code I emit for the D3 copy is the same code I emit for the I2 copy, why the huge overhead on the D3 copy?

EDIT:

The compiler emits:

IL_0000: newobj instance void ConsoleApplication3.Dest::.ctor()
    IL_0005: ldarg.0
    IL_0006: callvirt instance float64 ConsoleApplication3.Source::get_D3()
    IL_000b: box [mscorlib]System.Double
    IL_0010: stloc.0
    IL_0011: dup
    IL_0012: ldloc.0
    IL_0013: brtrue.s IL_0018

    IL_0015: ldnull
    IL_0016: br.s IL_001e

    IL_0018: ldloc.0
    IL_0019: callvirt instance string [mscorlib]System.Object::ToString()

    IL_001e: callvirt instance void ConsoleApplication3.Dest::set_D3(string)
    IL_0023: ret

This particular section of my code does not emit the new for the Dest object, that's done elsewhere. The dup is dupeing the Dest object as seen in the C# above.

LocalBuilder localBuilderObject = generator.DeclareLocal(_typeOfObject);

Label labelNull = generator.DefineLabel();
Label labelNotNull = generator.DefineLabel();

generator.Emit(OpCodes.Ldarg_0);
generator.Emit(OpCodes.Callvirt, miGetter);
generator.Emit(OpCodes.Box, typeSource);
generator.Emit(OpCodes.Stloc_S, localBuilderObject);
generator.Emit(OpCodes.Dup);
generator.Emit(OpCodes.Ldloc_S, localBuilderObject);
generator.Emit(OpCodes.Brtrue, labelNotNull);
generator.Emit(OpCodes.Ldnull);
generator.Emit(OpCodes.Br, labelNull);
generator.MarkLabel(labelNotNull);
generator.Emit(OpCodes.Ldloc_S, localBuilderObject);
generator.Emit(OpCodes.Callvirt, _miToString);
generator.MarkLabel(labelNull);
generator.Emit(OpCodes.Callvirt,miSetter);

As I mentioned, I box the type so I can call Object::ToString() generically without worrying about value types. Ref types go through this path as well. The C# code is made to behave like this and still takes 1/2 the time???

I've been messing with this issue all weekend. Further testing shows other value types are 1:1. int, long, etc. For some reason the double is causing a problem.

Upvotes: 2

Views: 862

Answers (2)

Tony THONG
Tony THONG

Reputation: 772

Jump over if null (brfalse) instead of double jump. Your benchmark may be false for 3 reasons based on the way (not posted here) you call your generated code :

  1. you can call it only with delegate (if you call don't do it in an other generated code)
  2. your regular code must be called with a delegate to be comparable.
  3. delegate to non static method is faster than delegate build for static method (clr will push null, jump and pop the unused null value before the real treatment). You have to generate a static method with a first unused argument (reference type) and call Delegate.CreateDelegate with target specified explicitly to null for target to prevent it.

Upvotes: 1

Tamir Vered
Tamir Vered

Reputation: 10287

As you can see in the C# compiled code, fast local-access instructions are used:

IL_000b: box [mscorlib]System.Double
IL_0010: stloc.0
IL_0011: dup
IL_0012: ldloc.0
...
IL_0018: ldloc.0

Instead, in your IL generated code, you use stloc.s and ldloc.s which also take an operand of the local index.

Also make sure that you cache (you probably are if the C# runs only twice faster) the generated method per Type it's being generated for.

Upvotes: 1

Related Questions