Reputation: 21

c# Generics - unexpected performance results

I believe Microsoft claims that generics is faster than using plain polymorphism when dealing with reference types. However the following simple test (64bit VS2012) would indicate otherwise. I typically get 10% faster stopwatch times using polymorphism. Am I misinterpreting the results?

public interface Base { Int64 Size { get; } }
public class Derived : Base { public Int64 Size { get { return 10; } } }

public class GenericProcessor<TT> where TT : Base
{
    private Int64 sum;
    public GenericProcessor(){ sum = 0; }
    public void process(TT o){ sum += o.Size; }
    public Int64 Sum { get { return sum; } }
}
public class PolymorphicProcessor
{
    private Int64 sum;
    public PolymorphicProcessor(){ sum = 0; }
    public void process(Base o){ sum += o.Size; }
    public Int64 Sum { get { return sum; } }
}
static void Main(string[] args)
{
    var generic_processor = new GenericProcessor<Derived>();
    var polymorphic_processor = new PolymorphicProcessor();
    Stopwatch sw = new Stopwatch();
    int N = 100000000;
    var derived = new Derived();

    sw.Start();
    for (int i = 0; i < N; ++i) generic_processor.process(derived);
    sw.Stop();
    Console.WriteLine("Sum ="+generic_processor.Sum + " Generic performance = " + sw.ElapsedMilliseconds + " millisec");

    sw.Restart();
    sw.Start();
    for (int i = 0; i < N; ++i) polymorphic_processor.process(derived);
    sw.Stop();
    Console.WriteLine("Sum ="+polymorphic_processor.Sum+ " Poly performance = " + sw.ElapsedMilliseconds + " millisec");

Even more surprising (and confusing) is that if I add a type cast to the polymorphic version of processor as follows, it then runs consistently ~20% faster than the generic version.

        public void process(Base trade)
        {
            sum += ((Derived)trade).Size; // cast not needed - just an experiment
        }

What's going on here? I understand generics can help avoid costly boxing and unboxing when dealing with primitive types, but I'm dealing strictly with reference types here.

Upvotes: 2

Answers (2)

usr

Reputation: 171236

Execute the test under .NET 4.5 x64 with Ctrl-F5 (without debugger). Also with N increased by 10x. That way the results reliably reproduce, no matter what order the tests are in.

With generics on ref types you still get the same vtable/interface lookup because there's just one compiled method for all ref types. There's no specialization for Derived. Performance of executing the callvirt should be the same based on this.

Furthermore, generic methods have a hidden method argument that is typeof(T) (because this allows you to actually write typeof(T) in generic code!). This is additional overhead explaining why the generic version is slower.

Why is the cast faster than the interface call? The cast is just a pointer compare and a perfectly predictable branch. After the cast the concrete type of the object is known, allowing for a faster call.

if (trade.GetType() != typeof(Derived)) throw;
Derived.Size(trade); //calling directly the concrete method, potentially inlining it

All of this is educated guessing. Validate by looking at the disassembly.

If you add the cast you get the following assembly:

enter image description here

My assembly skills are not enough to fully decode this. However:

16 loads the vtable ptr of Derived
22 and #25 are the branch to test the vtable. This completes the cast.
at #32 the cast is done. Note, that following this point there's no call. Size was inlined.
35 a lea implements the add
39 store back to this.sum

The same trick works with the generic version (((Derived)(Base)o).Size).

Upvotes: 2

Nick Bray

Reputation: 1963

I believe Servy was correct it is a problem with your test. I reversed the order of the tests (just a hunch):

internal class Program
{
    public interface Base
    {
        Int64 Size { get; }
    }

    public class Derived : Base
    {
        public Int64 Size
        {
            get
            {
                return 10;
            }
        }
    }

    public class GenericProcessor<TT>
        where TT : Base
    {
        private Int64 sum;

        public GenericProcessor()
        {
            sum = 0;
        }

        public void process(TT o)
        {
            sum += o.Size;
        }

        public Int64 Sum
        {
            get
            {
                return sum;
            }
        }
    }

    public class PolymorphicProcessor
    {
        private Int64 sum;

        public PolymorphicProcessor()
        {
            sum = 0;
        }

        public void process(Base o)
        {
            sum += o.Size;
        }

        public Int64 Sum
        {
            get
            {
                return sum;
            }
        }
    }

    private static void Main(string[] args)
    {
        var generic_processor = new GenericProcessor<Derived>();
        var polymorphic_processor = new PolymorphicProcessor();
        Stopwatch sw = new Stopwatch();
        int N = 100000000;
        var derived = new Derived();
        sw.Start();
        for (int i = 0; i < N; ++i) polymorphic_processor.process(derived);
        sw.Stop();
        Console.WriteLine(
            "Sum =" + polymorphic_processor.Sum + " Poly performance = " + sw.ElapsedMilliseconds + " millisec");


        sw.Restart();
        sw.Start();
        for (int i = 0; i < N; ++i) generic_processor.process(derived);
        sw.Stop();
        Console.WriteLine(
            "Sum =" + generic_processor.Sum + " Generic performance = " + sw.ElapsedMilliseconds + " millisec");

        Console.Read();
    }
    }

In this case the polymorphic is slower in my tests. This shows that the first test is significantly slower than the second test. It could be loading classes the first time, preemptions, who knows ...

I just want to note that I am not arguing that generics are faster or as fast. I'm simply trying to prove that these kinds of tests don't make a case one way or the other.

Upvotes: 1

c# Generics - unexpected performance results

Answers (2)

16 loads the vtable ptr of Derived

22 and #25 are the branch to test the vtable. This completes the cast.

35 a `lea` implements the add

39 store back to `this.sum`

Related Questions

c# Generics - unexpected performance results

Answers (2)

16 loads the vtable ptr of Derived

22 and #25 are the branch to test the vtable. This completes the cast.

35 a lea implements the add

39 store back to this.sum

Related Questions

35 a `lea` implements the add

39 store back to `this.sum`