Reputation: 6249
I know the title is a bit vague. But what I'm trying to achieve is something like this:
Inside an abstract class:
public abstract bool TryGet<T>(string input, out T output) where T : struct;
Inside a class with this signature:
private class Param<T> : AbstractParam where T : struct
This implementation:
public override bool TryGetVal<TOriginal>(string input, out TOriginal output)
{
T oTemp;
bool res = _func(input, out oTemp); // _func is the actual function
// that retrieves the value.
output = (TOriginal)oTemp; // Compile-time error
return res;
}
And TOriginal
will always be the same type as T
. This'd bypass the compile-time error, but I don't want to do this cause of the performance hit:
output = (TOriginal)(object)oTemp;
If it'd be reference types, this'd provide the solution:
output = oTemp as TOriginal;
Reflection/dynamic would also solve the problem, but that performance hit is even bigger:
output = (TOriginal)(dynamic)oTemp;
I tried using unsafe code, unsuccessfully, but that might just be me.
So my best hopes would be that the compiler either optimizes (TOriginal)(object)oTemp
to (TOriginal)oTemp
which I don't know. Or that there's an unsafe approach to this.
Save me the lecture on premature optimization, I want to know this purely for research, and am interested if there's a way to get past this limitation. I realize this'll have a negligible impact on the actual performance.
Final conclusion:
After disassembling the situation these were the results:
return (TOut)(object)_value;
00000000 push ebp
00000001 mov ebp,esp
00000003 push eax
00000004 mov dword ptr [ebp-4],ecx
00000007 cmp dword ptr ds:[003314CCh],0
0000000e je 00000015
00000010 call 61A33AD3
00000015 mov eax,dword ptr [ebp-4]
00000018 mov eax,dword ptr [eax+4]
0000001b mov esp,ebp
0000001d pop ebp
0000001e ret
return _value;
00000000 push ebp
00000001 mov ebp,esp
00000003 push eax
00000004 mov dword ptr [ebp-4],ecx
00000007 cmp dword ptr ds:[004814B4h],0
0000000e je 00000015
00000010 call 61993AA3
00000015 mov eax,dword ptr [ebp-4]
00000018 mov eax,dword ptr [eax+4]
0000001b mov esp,ebp
0000001d pop ebp
0000001e ret
Turns out today's compiler optimizes this, and therefore there is no performance cost.
output = (TOriginal)(object)oTemp;
This is the most optimized way of doing this :).
Thanks Eric Lippert and Ben Voigt.
A note on reference types:
When removing the struct
constraint and passing a reference type (in my case a string
), this optimization is NOT made.
Result:
return (TOut)(object)_value;
00000000 push ebp
00000001 mov ebp,esp
00000003 sub esp,10h
00000006 mov dword ptr [ebp-4],edx
00000009 mov dword ptr [ebp-10h],ecx
0000000c mov dword ptr [ebp-8],edx
0000000f cmp dword ptr ds:[003314B4h],0
00000016 je 0000001D
00000018 call 61A63A43
0000001d mov eax,dword ptr [ebp-8]
00000020 mov eax,dword ptr [eax+0Ch]
00000023 mov eax,dword ptr [eax]
00000025 mov dword ptr [ebp-0Ch],eax
00000028 test dword ptr [ebp-0Ch],1
0000002f jne 00000036
00000031 mov ecx,dword ptr [ebp-0Ch]
00000034 jmp 0000003C
00000036 mov eax,dword ptr [ebp-0Ch]
00000039 mov ecx,dword ptr [eax-1]
0000003c mov eax,dword ptr [ebp-10h]
0000003f mov edx,dword ptr [eax+4]
00000042 call 617D79D8
00000047 mov esp,ebp
00000049 pop ebp
0000004a ret
return _value;
00000000 push ebp
00000001 mov ebp,esp
00000003 push eax
00000004 mov dword ptr [ebp-4],ecx
00000007 cmp dword ptr ds:[003314B4h],0
0000000e je 00000015
00000010 call 61A639E3
00000015 mov eax,dword ptr [ebp-4]
00000018 mov eax,dword ptr [eax+4]
0000001b mov esp,ebp
0000001d pop ebp
0000001e ret
If you want a cheap way to cast to 'without proper type checking' the as
operator is your solution.
Upvotes: 2
Views: 299
Reputation: 660377
I find it ironic that Ben's answer says both "use science: measure it to find out", and "here's my belief about what really happens":
I don't think there actually is ANY performance hit there. Generic methods are JITted for each value type, that process should completely eliminate any imagined performance hit.
Based upon actual disassembly shown in the updated question, the original poster claims that the jitter used apparently does this optimization at least some of the time. I have not analyzed this claim to see if it is correct; I would want to actually see the real code being compiled, the IL, and the assembly generated to understand what is going on here.
In my investigations in this area in the past, I discovered numerous situations in which the verifier and jitter were insufficiently clever, particularly around eliminating boxing penalties. Whether those have all been eliminated I do not know.
If some of them have been eliminated then I'm happy to learn that.
You therefore cannot conclude prima facie that the jitter does or does not perform this optimization of eliminating boxing. I have seen cases where it does not; we have an unverified claim here that in some cases it does.
Ben goes on to give some good advice:
But you're welcome to demonstrate that there actually is a cost, through performance data (real profiler measurements) or, at the very least, disassembly of the JIT-generated machine code.
Indeed, I strongly recommend that you do so, and on more than one jitter.
Let's start over and actually answer the questions that were asked. We should begin by simplifying and clarifying the case described:
abstract class B
{
public abstract T M<T>() where T : struct;
}
private class D<U> : B where U : struct
{
public override V M<V>()
{
U u = default(U);
return (V)u; // compile-time error
}
}
The original poster states that V will always be the same as U. There's the first problem. Why? Nothing whatsoever is stopping the user from calling M<bool>
on an instance of D<double>
. The type checker is entirely correct in noting that there might not be a conversion from U to V.
As the original poster notes, you can do an end-run around the type checker by boxing and unboxing:
return (V)(object)u; // Runtime error, not compile-time error
The question then is "in the case where this does not crash and die horribly at runtime, is the boxing penalty eliminated by the jitter?"
The jitter jits a method only once and shares the code for reference type arguments, but re-jits it every time for different value type parameters. There is therefore an opportunity to eliminate the penalty when particular arguments supplied for U and V are the same value type.
I wondered that myself once a few years ago and so I checked. I have not checked more recent builds of the jitter, but last time I checked the boxing penalty was not eliminated. The jitter allocates the memory, copies the value to the heap, and then copies it right back out again.
Apparently, according to the updated question, this is no longer the case; the jitter tested now performs this optimization. Like I said, I have not verified that claim myself.
The jitter is permitted to make this optimization, but last time I checked, in practice it did not, so we know that there is at least one jitter out there in the wild that does not make this optimization.
A more interesting example is one where the type arguments actually are constrained to be equal:
abstract class E<T>
{
public abstract U M<U>(T t) where U : T;
}
class F<V> : E<V> where V : struct
{
public override W M<W>(V v)
{
return v; // Error
}
}
Again, this is illegal, even though the C# compiler could logically deduce that W now must be identical to V.
You can again introduce casts to fix the problem, but the IL verifier's type analyzer requires that the V be boxed and unboxed as W.
And once again, the jitter could deduce that the boxing and unboxing is a no-op, and eliminate it, but the last time I checked it did not. It might now; try it and see.
I reported that as a possible optimization to the jitter team; they informed me that they had many higher priorities, which is a perfectly reasonable response. This is an obscure and unlikely scenario, not one that I would prioritize highly either.
If it is in fact the case that this optimization is now made, then I am pleasantly surprised.
Upvotes: 4
Reputation: 3406
You'd want to avoid casting your T/TOriginal value as an object, that would cause a boxing issue to occurr where the value-type(which all structs are) would be encapsulated as a System.Object on the heap. There's a couple ways to get around the casting problem. The simplest way would be to have your abstract class contain a class level generic type parameter instead of the TryGet
method, like:
public abstract class AbstractParam<T> where T : struct
{
//....
public abstract bool TryGet(string input, out T output);
}
Another option is cast into a Nullable<TOriginal>
and then call GetValueOrDefault()
like so:
public override bool TryGet<TOriginal>(string input, out TOriginal output)
{
T oTemp;
bool res = _func(input, out oTemp);
Nullable<TOriginal> n = oTemp as Nullable<TOriginal>;
output = n.GetValueOrDefault();
return res;
}
Upvotes: 0
Reputation: 283763
I'm going to give you that lecture you didn't want, because you clearly don't understand it.
There are TWO reasons for the "Measure, Measure, Measure!" (or equivalently "Profile, Profile, Profile!") approach to optimization:
Putting effort where it has the biggest impact. This is where the term "premature optimization" comes in.
Sometimes this reason doesn't apply (when you want to know the theory / for academic reasons).
To find out which implementation actually IS faster.
Modern CPUs are complex beasts, to the point where even comparing two different sequences of machine code can't show which is better, due to the intricacies of cache behavior, pipeline data dependencies, microcode, etc. And you're operating two levels higher than that (C# code -> MSIL -> machine code). There's no telling what optimizations are going to take place without measuring.
You said:
This'd bypass the compile-time error, but I don't want to do this cause of the performance hit:
output = (TOriginal)(object)oTemp;
But I don't think there actually is ANY performance hit there. Generic methods are JITted for each value type, that process should completely eliminate any imagined performance hit. But you're welcome to demonstrate that there actually is a cost, through performance data (real profiler measurements) or, at the very least, disassembly of the JIT-generated machine code.
In this particular case, it isn't clear why you have two different generic type parameters to begin with, if they're always the same type as you claimed. Just get rid of TOriginal
and use T
for the type of the output parameter.
Upvotes: 6