Reputation: 1721
Today I ran into this issue: When using reference types as type arguments for a outer generic type, other methods in nested types are slower by a factor ~10. It does not matter which types I use - all reference types seem to "slow" the code down. (Sorry for the title, maybe somebody can find a more suitable one.)
Tested with .NET 5/Release builds.
What am I missing?
EDIT 2:
I'll try to explain the problem a little bit further and cleanup the code. If you still want to see the old version, here is a copy:
https://gist.github.com/sneusse/1b5ee408dd3fdd74fcf9d369e144b35f
The new code illustrates the same issue with hopefully less distraction.
WthGeneric<T>
is instantiated twiceobject
)long
)WhatIsHappeningHere
This leads to the question: Why is the runtime of the same instance method 10x higher than the other one?
Output:
System.Object: 516,8448ms
System.Int64: 50,6958ms
Code:
using System;
using System.Diagnostics;
using System.Linq;
namespace Perf
{
public interface IWthGeneric
{
int WhatIsHappeningHere();
}
// This is a generic class. Note that the generic
// type argument 'T' is _NOT_ used at all!
public class WthGeneric<T> : IWthGeneric
{
// This is part of the issue.
// If this field is not accessed or moved *outside*
// of the generic 'WthGeneric' class, the code is fast again
// ** also with reference types **
public static int StaticVar = 12;
static class NestedClass
{
public static int Add(int value) => StaticVar + value;
}
public int WhatIsHappeningHere()
{
var x = 0;
for (int i = 0; i < 100000000; i++)
{
x += NestedClass.Add(i);
}
return x;
}
}
public class RunMe
{
public static void Run()
{
// The interface is used so nothing could ever get inlined.
var wthObject = (IWthGeneric) new WthGeneric<object>();
var wthValueType = (IWthGeneric) new WthGeneric<long>();
void Test(IWthGeneric instance)
{
var sw = Stopwatch.StartNew();
var x = instance.WhatIsHappeningHere();
Console.WriteLine(
$"{instance.GetType().GetGenericArguments().First()}: " +
$"{sw.Elapsed.TotalMilliseconds}ms");
}
for (int i = 0; i < 10; i++)
{
Test(wthObject);
Test(wthValueType);
}
}
}
}
Upvotes: 3
Views: 642
Reputation: 72470
Not 100% sure, but I think I know why the JIT is not optimizing this:
As I understand it, every generic type generally only has one version of the JITted code for reference types, named System.__Canon
, and the type parameter is passed in as an actual typeref
parameter. Whereas for valuetypes each one is generated separately.
This is because a reference type always looks the same to the JIT: a pointer to an object which has its first field as a pointer to its typeref and methodtable. But valuetypes are all different, so each must be custom-built.
You say you don't use the type parameter, but actually you do. When you access a static field of a generic type, each instantiated generic type needs a separate copy of the static field.
So the code must now do a pointer lookup to the type parameter's typeref to get the static field's value.
But in the valuetype version, the typeref is statically known, therefore it's a straight memory access every time.
Upvotes: 5
Reputation: 40335
I'm ready to say this is a jitter's fault. Perhaps "fault" is too strong word. The jitter does not optimize this case.
Using SharpLap to look at the JIT asm of this code:
using SharpLab.Runtime;
[JitGeneric(typeof(int))]
public class A<T>
{
public static int X;
public static class B
{
public static int C() => X;
}
}
Note: The attribute JitGeneric(typeof(int))
is telling SharpLab to JIT this code with the generic argument int
. Without a generic argument, it is not possible to JIT a generic type.
We get this:
; Core CLR v5.0.321.7212 on x86
A`1[[System.Int32, System.Private.CoreLib]]..ctor()
L0000: ret
A`1+B[[System.Int32, System.Private.CoreLib]].C()
L0000: mov ecx, 0x2051c600
L0005: xor edx, edx
L0007: call 0x5e646b70
L000c: mov eax, [eax+4]
L000f: ret
Meanwhile, for this code:
using SharpLab.Runtime;
[JitGeneric(typeof(object))]
public class A<T>
{
public static int X;
public static class B
{
public static int C() => X;
}
}
Note: Yes, this is the same class, except now I'm telling SharpLap to JIT it for the generic argument object
.
We get this:
; Core CLR v5.0.321.7212 on x86
A`1[[System.__Canon, System.Private.CoreLib]]..ctor()
L0000: ret
A`1+B[[System.__Canon, System.Private.CoreLib]].C()
L0000: push ebp
L0001: mov ebp, esp
L0003: push eax
L0004: mov [ebp-4], ecx
L0007: mov edx, [ecx+0x20]
L000a: mov edx, [edx]
L000c: mov edx, [edx+8]
L000f: test edx, edx
L0011: je short L0015
L0013: jmp short L0021
L0015: mov edx, 0x2046cec4
L001a: call 0x5e4e4090
L001f: mov edx, eax
L0021: mov ecx, edx
L0023: call 0x5e4fa760
L0028: mov eax, [eax+4]
L002b: mov esp, ebp
L002d: pop ebp
L002e: ret
We observe that for the reference type generic argument, we get a much longer code. Is that code necessary? Well, we are accessing a public static field of a generic class. Let us see how that looks if the other class is not nested:
using SharpLab.Runtime;
public static class Bint
{
public static int C() => A<int>.X;
}
public static class Bobject
{
public static int C() => A<object>.X;
}
[JitGeneric(typeof(object))]
public class A<T>
{
public static int X;
}
We get this code:
; Core CLR v5.0.321.7212 on x86
Bint.C()
L0000: mov ecx, 0x209fc618
L0005: xor edx, edx
L0007: call 0x5e646b70
L000c: mov eax, [eax+4]
L000f: ret
Bobject.C()
L0000: mov ecx, 0x209fc618
L0005: mov edx, 1
L000a: call 0x5e646b70
L000f: mov eax, [eax+4]
L0012: ret
A`1[[System.__Canon, System.Private.CoreLib]]..ctor()
L0000: ret
Therefore, no, we don't need the long version of the code. We must conclude that the jitter is not optimizing this case appropriately.
Upvotes: 4