Boaz
Boaz

Reputation: 26109

Why does casting List<T> into IList<T> result in reduced performance?

I was doing some performance metrics and I ran into something that seems quite odd to me. I time the following two functions:

  private static void DoOne()
      {
         List<int> A = new List<int>();
         for (int i = 0; i < 200; i++) A.Add(i);
          int s=0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < A.Count; c++) s += A[c];
         }

      }

   private static void DoTwo()
      {
         List<int> A = new List<int>();
         for (int i = 0; i < 200; i++) A.Add(i);
         IList<int> L = A;
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < L.Count; c++) s += L[c];
         }

      }

Even when compiling in release mode, the timings results were consistently showing that DoTwo takes ~100 longer then DoOne:

 DoOne took 0.06171706 seconds.
 DoTwo took 8.841709 seconds.

Given the fact the List directly implements IList I was very surprised by the results. Can anyone clarify this behavior?

The gory details

Responding to questions, here is the full code and an image of the project build preferences:

Dead Image Link

using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
using System.Collections;

namespace TimingTests
{
   class Program
   {
      static void Main(string[] args)
      {
         Stopwatch SW = new Stopwatch();
         SW.Start();
         DoOne();
         SW.Stop();

         Console.WriteLine(" DoOne took {0} seconds.", ((float)SW.ElapsedTicks) / Stopwatch.Frequency);
         SW.Reset();
         SW.Start();
         DoTwo();
         SW.Stop();

         Console.WriteLine(" DoTwo took {0} seconds.", ((float)SW.ElapsedTicks) / Stopwatch.Frequency);

      }

      private static void DoOne()
      {
         List<int> A = new List<int>();
         for (int i = 0; i < 200; i++) A.Add(i);
         int s=0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < A.Count; c++) s += A[c];
         }

      }
      private static void DoTwo()
      {
         List<int> A = new List<int>();
         for (int i = 0; i < 200; i++) A.Add(i);
         IList<int> L = A;
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < L.Count; c++) s += L[c];
         }

      }
   }
}

Thanks for all the good answers (especially @kentaromiura). I would have closed the question, though I feel we still miss an important part of the puzzle. Why would accessing a class via an interface it implements be so much slower? The only difference I can see is that accessing a function via an Interface implies using virtual tables while the normally the functions can be called directly. To see whether this is the case I made a couple of changes to the above code. First I introduced two almost identical classes:

  public class VC
  {
     virtual public int f() { return 2; }
     virtual public int Count { get { return 200; } }

  }

  public class C
  {
      public int f() { return 2; }
      public int Count { get { return 200; } }

  }

As you can see VC is using virtual functions and C doesn't. Now to DoOne and DoTwo:

    private static void DoOne()
      {  C a = new C();
         int s=0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < a.Count; c++) s += a.f();
         }

      }
      private static void DoTwo()
      {
           VC a = new VC();
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int c = 0; c < a.Count; c++) s +=  a.f();
         }

      }

And indeed:

DoOne took 0.01287789 seconds.
DoTwo took 8.982396 seconds.

This is even more scary - virtual function calls 800 times slower?? so a couple of question to the community:

  1. Can you reproduce? (given the fact that all had worse performance before, but not as bad as mine)
  2. Can you explain?
  3. (this may be the most important) - can you think of a way to avoid?

Boaz

Upvotes: 14

Views: 4964

Answers (7)

Boaz
Boaz

Reputation: 26109

First I want to thank all for their answers. It was really essential in the path figuring our what was going on. Special thanks goes to @kentaromiura which found the key needed to get to the bottom of things.

The source of the slow down of using List<T> via an IList<T> interface is the lack of the ability of the JIT complier to inline the Item property get function. The use of virtual tables caused by accessing the list through it's IList interface prevents that from happening.

As a proof , I have written the following code:

      public class VC
      {
         virtual public int f() { return 2; }
         virtual public int Count { get { return 200; } }

      }

      public class C
      {
         //[MethodImpl( MethodImplOptions.NoInlining)]
          public int f() { return 2; }
          public int Count 
          {
            // [MethodImpl(MethodImplOptions.NoInlining)] 
            get { return 200; } 
          }

      }

and modified the DoOne and DoTwo classes to the following:

      private static void DoOne()
      {
         C c = new C();
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int i = 0; i < c.Count; i++) s += c.f();
         }

      }
      private static void DoTwo()
      {
         VC c = new VC();
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int i = 0; i < c.Count; i++) s += c.f();
         }

      }

Sure enough the function times are now very similar to before:

 DoOne took 0.01273598 seconds.
 DoTwo took 8.524558 seconds.

Now, if you remove the comments before the MethodImpl in the C class (forcing the JIT not to inline) - the timing becomes:

DoOne took 8.734635 seconds.
DoTwo took 8.887354 seconds.

Voila - the methods take almost the same time. You can stil see that method DoOne is still slightly fast , which is consistent which the extra overhead of a virtual function.

Upvotes: 8

LukeH
LukeH

Reputation: 269328

My tests show the interface version to be about 3x slower when compiled in release mode. When compiled in debug mode they're almost neck-and-neck.

--------------------------------------------------------
 DoOne Release (ms) |  92 |  91 |  91 |  92 |  92 |  92
 DoTwo Release (ms) | 313 | 313 | 316 | 352 | 320 | 318
--------------------------------------------------------
 DoOne Debug (ms)   | 535 | 534 | 548 | 536 | 534 | 537
 DoTwo Debug (ms)   | 566 | 570 | 569 | 565 | 568 | 571
--------------------------------------------------------

EDIT

In my tests I used a slightly modified version of the DoTwo method so that it was directly comparable to DoOne. (This change didn't make any discernible difference to the performance.)

private static void DoTwo()
{
    IList<int> A = new List<int>();
    for (int i = 0; i < 200; i++) A.Add(i);
    int s = 0;
    for (int j = 0; j < 100000; j++)
    {
       for (int c = 0; c < A.Count; c++) s += A[c];
    }
}

The only difference between the IL generated for DoOne and (modified) DoTwo is that the callvirt instructions for Add, get_Item and get_Count use IList and ICollection rather than List itself.

I'm guessing that the runtime has to do more work to find the actual method implementation when the callvirt is through an interface (and that the JIT compiler/optimiser can do a better job with the non-interface calls than the interface calls when compiling in release mode).

Can anybody confirm this?

Upvotes: 3

kentaromiura
kentaromiura

Reputation: 6499

Profiling one on one:

Testing with Snippet compiler.

using your code results:

0.043s vs 0.116s

eliminating Temporary L

0.043s vs 0.116s - ininfluent

by caching A.count in cmax on both Methods

0.041s vs 0.076s

     IList<int> A = new List<int>();
     for (int i = 0; i < 200; i++) A.Add(i);

     int s = 0;
     for (int j = 0; j < 100000; j++)
     {
        for (int c = 0,cmax=A.Count;c< cmax;  c++) s += A[c];
     }

Now I will try to slow down DoOne, first try, casting to IList before add:

for (int i = 0; i < 200; i++) ((IList<int>)A).Add(i);

0,041s 0,076s - so add is ininfluent

so it remains only one place where the slowdown can happen : s += A[c]; so I try this:

s += ((IList<int>)A)[c];

0.075s 0.075s - TADaaan!

so seems that accessing Count or a index element is slower on the interfaced version:

EDIT: Just for fun take a look at this:

 for (int c = 0,cmax=A.Count;c< cmax;  c++) s += ((List<int>)A)[c];

0.041s 0.050s

so is not a cast problem, but a reflection one!

Upvotes: 9

Eric Lippert
Eric Lippert

Reputation: 660004

A note to everyone out there who is trying to benchmark stuff like this.

Do not forget that the code is not jitted until the first time it runs. That means that the first time you run a method, the cost of running that method could be dominated by the time spent loading the IL, analyzing the IL, and jitting it into machine code, particularly if it is a trivial method.

If what you're trying to do is compare the "marginal" runtime cost of two methods, it's a good idea to run both of them twice and consider only the second runs for comparison purposes.

Upvotes: 27

Eric Lippert
Eric Lippert

Reputation: 660004

I am seeing some significant penalty for the interface version but nowhere near the magnitude penalty you are seeing.

Can you post a small, complete program that demonstrates the behaviour along with exactly how you are compiling it and exactly what version of the framework you are using?

Upvotes: 4

cfeduke
cfeduke

Reputation: 23226

I've ran this using Jon Skeet's Benchmark Helper and I am not seeing the results you are, the execution time is approximately the same between the two methods.

Upvotes: 2

arul
arul

Reputation: 14084

I believe that the problem lies in your time metrics, what are you using to measure the elapsed time?

Just for the record, here are my results:

DoOne() -> 295 ms
DoTwo() -> 291 ms

And the code:

        Stopwatch sw = new Stopwatch();

        sw.Start();
        {
            DoOne();
        }
        sw.Stop();

        Console.WriteLine("DoOne() -> {0} ms", sw.ElapsedMilliseconds);

        sw.Reset();

        sw.Start();
        {
            DoTwo();
        }
        sw.Stop();

        Console.WriteLine("DoTwo() -> {0} ms", sw.ElapsedMilliseconds);

Upvotes: 5

Related Questions