Reputation: 43464
I am trying to understand the reason for the difference in performance between two delegates. It occurred while I was trying to solve this question. @Enigmativity proposed an alternative way to type-cast, that resulted in a delegate with faster invocation. Here is a minimal version of that code:
delegate void MyAction<T>(T val);
static int Counter;
// My suggestion
static MyAction<T> GetAction1<T>()
=> new MyAction<T>((Action<T>)(object)ActionInt);
// Enigmativity's suggestion
static MyAction<T> GetAction2<T>()
=> (MyAction<T>)(Delegate)(MyAction<int>)ActionInt;
static void ActionInt(int val) { Counter++; }
There is a custom generic delegate-type MyAction<T>
, that has identical signature with the built-in Action<T>
. We want to instantiate this delegate from a generic <T>
method, and we want to cast it internally to a type-specific ActionInt
method. You can see my approach and Enigmativity's approach. It seems that in both cases the type casting occurs during the instantiation of the MyAction<T>
delegate. Invoking the resulting delegates should not incur type-casting overhead. At least this is my theory. But when I am measuring the performance of the resulting delegates, Enigmativity's delegate is consistently around 20% faster than mine:
static void Test(string title, MyAction<int> action)
{
Counter = 0;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 100_000_000; i++) action(i);
stopwatch.Stop();
Console.WriteLine($"{title}, Counter: {Counter:#,0}, Duration: {stopwatch.ElapsedMilliseconds:#,0} msec");
}
Test("GetAction1", GetAction1<int>());
Test("GetAction2", GetAction2<int>());
Test("GetAction1", GetAction1<int>());
Test("GetAction2", GetAction2<int>());
Output:
GetAction1, Counter: 100,000,000, Duration: 444 msec
GetAction2, Counter: 100,000,000, Duration: 374 msec
GetAction1, Counter: 100,000,000, Duration: 447 msec
GetAction2, Counter: 100,000,000, Duration: 371 msec
Can anyone explain why is this happening?
Upvotes: 1
Views: 294
Reputation: 109567
Using a decompiler, we can discover the following:
Implementation of GetAction1<T>()
:
IL_0000: ldnull
IL_0001: ldftn void ConsoleApp1.UnderTest::ActionInt(int32)
IL_0007: newobj instance void class [System.Runtime]System.Action`1<int32>::.ctor(object, native int)
IL_000c: castclass class [System.Runtime]System.Action`1<!!0/*T*/>
IL_0011: ldftn instance void class [System.Runtime]System.Action`1<!!0/*T*/>::Invoke(!0/*T*/)
IL_0017: newobj instance void class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>::.ctor(object, native int)
IL_001c: ret
Implementation of GetAction2<T>()
:
IL_0000: ldnull
IL_0001: ldftn void ConsoleApp1.UnderTest::ActionInt(int32)
IL_0007: newobj instance void class ConsoleApp1.UnderTest/MyAction`1<int32>::.ctor(object, native int)
IL_000c: castclass class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>
IL_0011: ret
You can see in the first case that it is actually creating two delegates, and chaining one to the other.
In the second case it is only creating one delegate.
I can't explain the exact reason for this, but I would think that it's because of the extra cast to object
in GetAction1
.
There appears to be an even faster implementation, namely:
public static MyAction<T> GetAction3<T>()
=> x => ActionInt((int)(object)x);
This generates much longer IL code:
IL_0000: ldsfld class ConsoleApp1.UnderTest/MyAction`1<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9__4_0'
IL_0005: dup
IL_0006: brtrue.s IL_0021
IL_0008: pop
IL_0009: ldsfld class ConsoleApp1.UnderTest/'<>c__4`1'<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9'
IL_000e: ldftn instance void class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<GetAction3>b__4_0'(!0/*T*/)
IL_0014: newobj instance void class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>::.ctor(object, native int)
IL_0019: dup
IL_001a: stsfld class ConsoleApp1.UnderTest/MyAction`1<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9__4_0'
IL_001f: stloc.0 // V_0
IL_0020: ldloc.0 // V_0
IL_0021: ret
And yet it is faster for both the call to GetAction3()
and executing the action that it returns.
Here's the benchmark program I tested with:
using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
namespace ConsoleApp1;
public static class Program
{
public static void Main()
{
var summary = BenchmarkRunner.Run<UnderTest>();
}
}
public class UnderTest
{
public delegate void MyAction<T>(T val);
public static int counter;
public static MyAction<T> GetAction1<T>()
=> new MyAction<T>((Action<T>)(object)ActionInt);
// Enigmativity's suggestion
public static MyAction<T> GetAction2<T>()
=> (MyAction<T>)(Delegate)(MyAction<int>)ActionInt;
public static MyAction<T> GetAction3<T>()
=> x => ActionInt((int)(object)x);
public static MyAction<int> Act1 = GetAction1<int>();
public static MyAction<int> Act2 = GetAction2<int>();
public static MyAction<int> Act3 = GetAction3<int>();
static void ActionInt(int val) { counter++; }
[Benchmark]
public void Action1()
{
_ = GetAction1<int>();
}
[Benchmark]
public void Action2()
{
_ = GetAction2<int>();
}
[Benchmark]
public void Action3()
{
_ = GetAction3<int>();
}
[Benchmark]
public void RunAction1()
{
Act1(0);
}
[Benchmark]
public void RunAction2()
{
Act2(0);
}
[Benchmark]
public void RunAction3()
{
Act3(0);
}
}
And the results:
| Method | Mean | Error | StdDev |
|----------- |-----------:|----------:|----------:|
| Action1 | 13.3355 ns | 0.1670 ns | 0.1480 ns |
| Action2 | 6.9685 ns | 0.1313 ns | 0.1228 ns |
| Action3 | 1.3437 ns | 0.0321 ns | 0.0285 ns |
| RunAction1 | 2.4100 ns | 0.0454 ns | 0.0425 ns |
| RunAction2 | 1.6493 ns | 0.0594 ns | 0.0527 ns |
| RunAction3 | 0.8347 ns | 0.0295 ns | 0.0276 ns |
Of course, none of the actions actually use the int
passed to them, since they all just call ActionInt()
which ignores its argument.
I suppose you could also implement it as:
public static MyAction<T> GetAction3<T>()
=> _ => ActionInt(0);
which might be even faster, but I haven't tried that.
Upvotes: 4