Reputation: 6249

What is the fastest way to convert the 'compile-time' type?

I know the title is a bit vague. But what I'm trying to achieve is something like this:

Inside an abstract class:

public abstract bool TryGet<T>(string input, out T output) where T : struct;

Inside a class with this signature:

private class Param<T> : AbstractParam where T : struct

This implementation:

public override bool TryGetVal<TOriginal>(string input, out TOriginal output)
{
    T oTemp;
    bool res = _func(input, out oTemp); // _func is the actual function
                                        // that retrieves the value.
    output = (TOriginal)oTemp; // Compile-time error
    return res;
}

And TOriginal will always be the same type as T. This'd bypass the compile-time error, but I don't want to do this cause of the performance hit:

output = (TOriginal)(object)oTemp;

If it'd be reference types, this'd provide the solution:

output = oTemp as TOriginal;

Reflection/dynamic would also solve the problem, but that performance hit is even bigger:

output = (TOriginal)(dynamic)oTemp;

I tried using unsafe code, unsuccessfully, but that might just be me.

So my best hopes would be that the compiler either optimizes (TOriginal)(object)oTemp to (TOriginal)oTemp which I don't know. Or that there's an unsafe approach to this.

Save me the lecture on premature optimization, I want to know this purely for research, and am interested if there's a way to get past this limitation. I realize this'll have a negligible impact on the actual performance.

Final conclusion:
After disassembling the situation these were the results:

                return (TOut)(object)_value;
00000000  push        ebp 
00000001  mov         ebp,esp 
00000003  push        eax 
00000004  mov         dword ptr [ebp-4],ecx 
00000007  cmp         dword ptr ds:[003314CCh],0 
0000000e  je          00000015 
00000010  call        61A33AD3 
00000015  mov         eax,dword ptr [ebp-4] 
00000018  mov         eax,dword ptr [eax+4] 
0000001b  mov         esp,ebp 
0000001d  pop         ebp 
0000001e  ret 

                return _value;
00000000  push        ebp 
00000001  mov         ebp,esp 
00000003  push        eax 
00000004  mov         dword ptr [ebp-4],ecx 
00000007  cmp         dword ptr ds:[004814B4h],0 
0000000e  je          00000015 
00000010  call        61993AA3 
00000015  mov         eax,dword ptr [ebp-4] 
00000018  mov         eax,dword ptr [eax+4] 
0000001b  mov         esp,ebp 
0000001d  pop         ebp 
0000001e  ret

Turns out today's compiler optimizes this, and therefore there is no performance cost.

output = (TOriginal)(object)oTemp;

This is the most optimized way of doing this :).

Thanks Eric Lippert and Ben Voigt.

A note on reference types:
When removing the struct constraint and passing a reference type (in my case a string), this optimization is NOT made.

Result:

                return (TOut)(object)_value;
00000000  push        ebp 
00000001  mov         ebp,esp 
00000003  sub         esp,10h 
00000006  mov         dword ptr [ebp-4],edx 
00000009  mov         dword ptr [ebp-10h],ecx 
0000000c  mov         dword ptr [ebp-8],edx 
0000000f  cmp         dword ptr ds:[003314B4h],0 
00000016  je          0000001D 
00000018  call        61A63A43 
0000001d  mov         eax,dword ptr [ebp-8] 
00000020  mov         eax,dword ptr [eax+0Ch] 
00000023  mov         eax,dword ptr [eax] 
00000025  mov         dword ptr [ebp-0Ch],eax 
00000028  test        dword ptr [ebp-0Ch],1 
0000002f  jne         00000036 
00000031  mov         ecx,dword ptr [ebp-0Ch] 
00000034  jmp         0000003C 
00000036  mov         eax,dword ptr [ebp-0Ch] 
00000039  mov         ecx,dword ptr [eax-1] 
0000003c  mov         eax,dword ptr [ebp-10h] 
0000003f  mov         edx,dword ptr [eax+4] 
00000042  call        617D79D8 
00000047  mov         esp,ebp 
00000049  pop         ebp 
0000004a  ret 

                return _value;
00000000  push        ebp 
00000001  mov         ebp,esp 
00000003  push        eax 
00000004  mov         dword ptr [ebp-4],ecx 
00000007  cmp         dword ptr ds:[003314B4h],0 
0000000e  je          00000015 
00000010  call        61A639E3 
00000015  mov         eax,dword ptr [ebp-4] 
00000018  mov         eax,dword ptr [eax+4] 
0000001b  mov         esp,ebp 
0000001d  pop         ebp 
0000001e  ret

If you want a cheap way to cast to 'without proper type checking' the as operator is your solution.

Upvotes: 2

Answers (3)

Eric Lippert

Reputation: 660377

I find it ironic that Ben's answer says both "use science: measure it to find out", and "here's my belief about what really happens":

I don't think there actually is ANY performance hit there. Generic methods are JITted for each value type, that process should completely eliminate any imagined performance hit.

Based upon actual disassembly shown in the updated question, the original poster claims that the jitter used apparently does this optimization at least some of the time. I have not analyzed this claim to see if it is correct; I would want to actually see the real code being compiled, the IL, and the assembly generated to understand what is going on here.

In my investigations in this area in the past, I discovered numerous situations in which the verifier and jitter were insufficiently clever, particularly around eliminating boxing penalties. Whether those have all been eliminated I do not know.

If some of them have been eliminated then I'm happy to learn that.

You therefore cannot conclude prima facie that the jitter does or does not perform this optimization of eliminating boxing. I have seen cases where it does not; we have an unverified claim here that in some cases it does.

Ben goes on to give some good advice:

But you're welcome to demonstrate that there actually is a cost, through performance data (real profiler measurements) or, at the very least, disassembly of the JIT-generated machine code.

Indeed, I strongly recommend that you do so, and on more than one jitter.

Let's start over and actually answer the questions that were asked. We should begin by simplifying and clarifying the case described:

abstract class B
{
    public abstract T M<T>() where T : struct; 
}

private class D<U> : B where U : struct
{
    public override V M<V>() 
    {
        U u = default(U);
        return (V)u; // compile-time error
    } 
}

The original poster states that V will always be the same as U. There's the first problem. Why? Nothing whatsoever is stopping the user from calling M<bool> on an instance of D<double>. The type checker is entirely correct in noting that there might not be a conversion from U to V.

As the original poster notes, you can do an end-run around the type checker by boxing and unboxing:

        return (V)(object)u; // Runtime error, not compile-time error

The question then is "in the case where this does not crash and die horribly at runtime, is the boxing penalty eliminated by the jitter?"

The jitter jits a method only once and shares the code for reference type arguments, but re-jits it every time for different value type parameters. There is therefore an opportunity to eliminate the penalty when particular arguments supplied for U and V are the same value type.

I wondered that myself once a few years ago and so I checked. I have not checked more recent builds of the jitter, but last time I checked the boxing penalty was not eliminated. The jitter allocates the memory, copies the value to the heap, and then copies it right back out again.

Apparently, according to the updated question, this is no longer the case; the jitter tested now performs this optimization. Like I said, I have not verified that claim myself.

The jitter is permitted to make this optimization, but last time I checked, in practice it did not, so we know that there is at least one jitter out there in the wild that does not make this optimization.

A more interesting example is one where the type arguments actually are constrained to be equal:

abstract class E<T>
{
    public abstract U M<U>(T t) where U : T; 
}
class F<V> : E<V> where V : struct
{
    public override W M<W>(V v)
    {
        return v; // Error 
    }
}

Again, this is illegal, even though the C# compiler could logically deduce that W now must be identical to V.

You can again introduce casts to fix the problem, but the IL verifier's type analyzer requires that the V be boxed and unboxed as W.

And once again, the jitter could deduce that the boxing and unboxing is a no-op, and eliminate it, but the last time I checked it did not. It might now; try it and see.

I reported that as a possible optimization to the jitter team; they informed me that they had many higher priorities, which is a perfectly reasonable response. This is an obscure and unlikely scenario, not one that I would prioritize highly either.

If it is in fact the case that this optimization is now made, then I am pleasantly surprised.

Upvotes: 4

doogle

Reputation: 3406

You'd want to avoid casting your T/TOriginal value as an object, that would cause a boxing issue to occurr where the value-type(which all structs are) would be encapsulated as a System.Object on the heap. There's a couple ways to get around the casting problem. The simplest way would be to have your abstract class contain a class level generic type parameter instead of the TryGet method, like:

    public abstract class AbstractParam<T> where T : struct
    {
        //....
        public abstract bool TryGet(string input, out T output);
    }

Another option is cast into a Nullable<TOriginal> and then call GetValueOrDefault() like so:

        public override bool TryGet<TOriginal>(string input, out TOriginal output)
        {
            T oTemp;
            bool res = _func(input, out oTemp);
            Nullable<TOriginal> n = oTemp as Nullable<TOriginal>;
            output = n.GetValueOrDefault();
            return res;
        }

Upvotes: 0

Ben Voigt

Reputation: 283763

I'm going to give you that lecture you didn't want, because you clearly don't understand it.

There are TWO reasons for the "Measure, Measure, Measure!" (or equivalently "Profile, Profile, Profile!") approach to optimization:

Putting effort where it has the biggest impact. This is where the term "premature optimization" comes in.

Sometimes this reason doesn't apply (when you want to know the theory / for academic reasons).
To find out which implementation actually IS faster.

Modern CPUs are complex beasts, to the point where even comparing two different sequences of machine code can't show which is better, due to the intricacies of cache behavior, pipeline data dependencies, microcode, etc. And you're operating two levels higher than that (C# code -> MSIL -> machine code). There's no telling what optimizations are going to take place without measuring.

You said:

This'd bypass the compile-time error, but I don't want to do this cause of the performance hit:
output = (TOriginal)(object)oTemp;

But I don't think there actually is ANY performance hit there. Generic methods are JITted for each value type, that process should completely eliminate any imagined performance hit. But you're welcome to demonstrate that there actually is a cost, through performance data (real profiler measurements) or, at the very least, disassembly of the JIT-generated machine code.

In this particular case, it isn't clear why you have two different generic type parameters to begin with, if they're always the same type as you claimed. Just get rid of TOriginal and use T for the type of the output parameter.

Upvotes: 6

What is the fastest way to convert the &#39;compile-time&#39; type?

Answers (3)

Related Questions

What is the fastest way to convert the 'compile-time' type?