hash
hash

Reputation: 131

First called method faster than second one

I had a look at the .NET Framework Sourcecode and stumbled on the implementation of LINQ-Sum

int Sum(this IEnumerable<int> source)

I saw that it was implemented with a foreach-loop and wondered why the guys at MS don't use a normal for-loop due to performance reasons (later I learnt that there is no longer a performance difference between a for-loop and a foreach-loop - but I didn't know that just until now).

So I copied the MS-implementation in my own project and wrote a little benchmark:

var range = Enumerable.Range(1, 1000);
Stopwatch sw = new Stopwatch();

//Do sth unimportant for warming up

sw.Start();
for(int i = 0; i <= 10000; i++)
{
    long z = i + 3;
}
sw.Stop();

//Implementation 1

sw.Reset();
sw.Start();
for (int i = 0; i <= 1000000; i++)
{
    long i1 = range.Sum1();
}
sw.Stop();
Console.WriteLine("Sum1: " + sw.ElapsedTicks.ToString());

//Implementation 2

sw.Reset();
sw.Start();
for (int i = 0; i <= 1000000; i++)
{
    long i2 = range.Sum2();
}
sw.Stop();
Console.WriteLine("Sum2: " + sw.ElapsedTicks.ToString());

And here are the two implementations of Sum (Note: both are identical, I first wanted to check if the measuring is working correctly):

public static class LinqExtension
{
    public static int Sum1(this IEnumerable<int> source)
    {
        int sum = 0;
        checked
        {
            foreach (int v in source) sum += v;
        }
        return sum;
    }

    public static int Sum2(this IEnumerable<int> source)
    {
        int sum = 0;
        checked
        {
            foreach (int v in source) sum += v;
        }
        return sum;
    }
}

Surprisingly I got two different results : Sum1 = 16043441 vs. Sum2 = 17480907

So I extended the benchmark a little bit and called Sum1 and Sum2 not just once, but multiple times in the following order:

  1. Sum1: 16035534
  2. Sum2: 17381296
  3. Sum2: 17441259
  4. Sum1: 16021378
  5. Sum1: 16000879
  6. Sum1: 15989672
  7. Sum2: 17342804
  8. Sum2: 17347417 ...

Hence Sum1 is always nearly 10% faster than Sum2. When I call Sum2 first, the result are contrary.

What causes these performance differences? Why is the first called method faster than the second one? Is my benchmark invalid?

I'm using Visual Studio 2015 CTP4 and .NET Framework 4.5.3

EDIT:

Results in milliseconds instead of ticks

  1. Sum1: 7714 ms
  2. Sum2: 8336 ms
  3. Sum2: 8321 ms
  4. Sum1: 7686 ms
  5. Sum1: 7693 ms
  6. Sum1: 7686 ms
  7. Sum2: 8372 ms
  8. Sum2: 8302 ms ...

Thanks to the comments, I fixed some mistakes and now the code looks like that:

sw.Start();
for (int i = 0; i <= 1000000; i++)
{
   i1 = range.Sum1();
}
sw.Stop();
Console.WriteLine("Sum1: " + sw.ElapsedMilliseconds.ToString() + "\n" + i1.ToString());

Now the results are totally different:

  1. Sum1: 8021 ms
  2. Sum2: 7587 ms
  3. Sum2: 7660 ms
  4. Sum1: 7989 ms
  5. Sum1: 8041 ms
  6. Sum1: 8038 ms
  7. Sum2: 7609 ms
  8. Sum2: 7613 ms

But there is still a difference, yet now the other way around.


Another update:

When I use

int[] range = new int[1000];
for (int m = 0; m < range.Length; m++)
            range[m] = m+1;

instead of

var range = Enumerable.Range(1, 1000);

both methods are equally fast.

  1. Sum1: 6966 ms
  2. Sum2: 6986 ms
  3. Sum2: 7045 ms
  4. Sum1: 7039 ms
  5. Sum1: 6932 ms
  6. Sum1: 7064 ms
  7. Sum2: 7023 ms
  8. Sum2: 7026 ms

Update: Tested it with Mono(SharpDevelop) and VS2013 and I got perfectly consistent results. So I think using VS2015 wasn't a great idea, since it's still a beta. Therefore the significance of the results is pretty low.


Another Update:

stakx commented:

Try calling each of your Sum1 and Sum2 methods at least once before you start measuring time, in order to make sure that the methods' code has been generated by the JIT. Otherwise you might be including the time required for JIT code generation in your benchmarking

So I called Sum1 and Sum2 one time before the measurings and suprisingly this solves the problem. But I don't understand why. I understand that generating the code by the JIT costs some time, but only the first time. In my test I have 20 for-loops, each of them calling Sum1 respectively Sum2 1.000.000 times. I do a measuring for every loop, and get constantly different values for Sum1 and Sum2. It would make sense, if the very first loop is slower, but that's not the case.

I've used ngen.exe to generate a native image and got the following results:

  1. Sum1: 6517 ms
  2. Sum2: 6837 ms
  3. Sum2: 6817 ms
  4. Sum1: 6511 ms
  5. Sum1: 6513 ms
  6. Sum1: 6513 ms
  7. Sum2: 6822 ms
  8. Sum2: 6942 ms ...

So there is still this difference.

Very important: It is NOT always the first method which is faster! Sometimes it's the first called method, sometimes the second one. But once the assembly was built the results are reproducible. It's pretty confusing for me and I can't see any pattern, when this happens.

Enigmativity:

Did you ever try swapping the order in which you called the methods? Calling Sum2 first?

Yeah, but then the result are just inverse. If Sum1 was the "fast method", after swapping, Sum2 is the fast one and Sum1 is the slow one.

Upvotes: 1

Views: 290

Answers (1)

t3chb0t
t3chb0t

Reputation: 18665

I modified your test code a little bit and I found that there are two factors that have an impact on the performance: whether you run over the same or two different collections and the type of the enumeration (I don't know why yet).

Enumerating a List<int> seems to be the slowest case. An array int[] is the fastest one and there is actually no difference between the two when you use two different ranges (there is however always a difference when lists are used):

static void Main(string[] args)
{
    // Try with .ToList() and .ToArray()
    var range1 = Enumerable.Range(1, 1000);
    var range2 = Enumerable.Range(1, 1000);

    int numberOfSums = 100000;
    int numberOfTests = 3;

    for (int i = 0; i < numberOfTests; i++)
    {
        SumBenchmark(range1, LinqExtension.Sum1, numberOfSums, "Sum1");
    }

    for (int i = 0; i < numberOfTests; i++)
    {
        // Also try with range1
        SumBenchmark(range2, LinqExtension.Sum2, numberOfSums, "Sum2");
    }

    Console.ReadKey();
}

static void SumBenchmark(IEnumerable<int> numbers, Func<IEnumerable<int>, int> sum, int numberOfSums, string name)
{
    Stopwatch sw = new Stopwatch();
    sw.Start();
    for (int i = 0; i < numberOfSums; i++)
    {
        long result = sum(numbers);
    }
    sw.Stop();
    Console.WriteLine("{2}: {0} ticks in {1} ms ", sw.ElapsedTicks.ToString(), sw.ElapsedMilliseconds.ToString(), name);
}

For me on the cotrary the second call is always faster.


EDIT: there is a huge performace hit if you disable the Prefer 32-bit option in the build settings and compile it as 64-bit - then the original Sum runs much faster:

Prefer 32-bit

...however it runs at the same speed without .ToList() and .ToArray()


EDIT-2: here's one more result where the Sum2 uses an int[] not an IEnumerable:

Sum1: in 878 ms 
Sum1: in 863 ms 
Sum1: in 875 ms 
Sum2: in 122 ms 
Sum2: in 122 ms 
Sum2: in 121 ms 
Linq: in 830 ms 
Linq: in 825 ms 
Linq: in 836 ms 

The generated IL is also different:

for

public static int Sum2(this int[] source)

it's

.method public hidebysig static int32  Sum2(int32[] source) cil managed
{
  .custom instance void [mscorlib]System.Runtime.CompilerServices.ExtensionAttribute::.ctor() = ( 01 00 00 00 ) 
  // Code size       28 (0x1c)
  .maxstack  2
  .locals init ([0] int32 sum,
           [1] int32 v,
           [2] int32[] CS$6$0000,
           [3] int32 CS$7$0001)
  IL_0000:  ldc.i4.0
  IL_0001:  stloc.0
  IL_0002:  ldarg.0
  IL_0003:  stloc.2
  IL_0004:  ldc.i4.0
  IL_0005:  stloc.3
  IL_0006:  br.s       IL_0014
  IL_0008:  ldloc.2
  IL_0009:  ldloc.3
  IL_000a:  ldelem.i4
  IL_000b:  stloc.1
  IL_000c:  ldloc.0
  IL_000d:  ldloc.1
  IL_000e:  add.ovf
  IL_000f:  stloc.0
  IL_0010:  ldloc.3
  IL_0011:  ldc.i4.1
  IL_0012:  add
  IL_0013:  stloc.3
  IL_0014:  ldloc.3
  IL_0015:  ldloc.2
  IL_0016:  ldlen
  IL_0017:  conv.i4
  IL_0018:  blt.s      IL_0008
  IL_001a:  ldloc.0
  IL_001b:  ret
} // end of method LinqExtension::Sum2

and for

public static int Sum1(this IEnumerable<int> source)

it's

.method public hidebysig static int32  Sum1(class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> source) cil managed
{
  .custom instance void [mscorlib]System.Runtime.CompilerServices.ExtensionAttribute::.ctor() = ( 01 00 00 00 ) 
  // Code size       44 (0x2c)
  .maxstack  2
  .locals init ([0] int32 sum,
           [1] int32 v,
           [2] class [mscorlib]System.Collections.Generic.IEnumerator`1<int32> CS$5$0000)
  IL_0000:  ldc.i4.0
  IL_0001:  stloc.0
  IL_0002:  ldarg.0
  IL_0003:  callvirt   instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator()
  IL_0008:  stloc.2
  .try
  {
    IL_0009:  br.s       IL_0016
    IL_000b:  ldloc.2
    IL_000c:  callvirt   instance !0 class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current()
    IL_0011:  stloc.1
    IL_0012:  ldloc.0
    IL_0013:  ldloc.1
    IL_0014:  add.ovf
    IL_0015:  stloc.0
    IL_0016:  ldloc.2
    IL_0017:  callvirt   instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
    IL_001c:  brtrue.s   IL_000b
    IL_001e:  leave.s    IL_002a
  }  // end .try
  finally
  {
    IL_0020:  ldloc.2
    IL_0021:  brfalse.s  IL_0029
    IL_0023:  ldloc.2
    IL_0024:  callvirt   instance void [mscorlib]System.IDisposable::Dispose()
    IL_0029:  endfinally
  }  // end handler
  IL_002a:  ldloc.0
  IL_002b:  ret
} // end of method LinqExtension::Sum1

after all iterating an IEnumerable argument is not too fast... and the optimization takes place only if the collection of the foreach loop is not an IEnumerable. DEMO

Upvotes: 1

Related Questions