Reputation: 34880

How is this virtual method call faster than the sealed method call?

I am doing some tinkering on the performance of virtual vs sealed members.

Below is my test code.

The output is

virtual total 3166ms
per call virtual 3.166ns
sealed total 3931ms
per call sealed 3.931ns

I must be doing something wrong because according to this the virtual call is faster than the sealed call.

I am running in Release mode with "Optimize code" turned on.

Edit: when running outside of VS (as a console app) the times are close to a dead heat. but the virtual almost always comes out in front.

[TestFixture]
public class VirtTests
{

    public class ClassWithNonEmptyMethods
    {
        private double x;
        private double y;

        public virtual void VirtualMethod()
        {
            x++;
        }
        public void SealedMethod()
        {
            y++;
        }
    }

    const int iterations = 1000000000;


    [Test]
    public void NonEmptyMethodTest()
    {

        var foo = new ClassWithNonEmptyMethods();
        //Pre-call
        foo.VirtualMethod();
        foo.SealedMethod();

        var virtualWatch = new Stopwatch();
        virtualWatch.Start();
        for (var i = 0; i < iterations; i++)
        {
            foo.VirtualMethod();
        }
        virtualWatch.Stop();
        Console.WriteLine("virtual total {0}ms", virtualWatch.ElapsedMilliseconds);
        Console.WriteLine("per call virtual {0}ns", ((float)virtualWatch.ElapsedMilliseconds * 1000000) / iterations);


        var sealedWatch = new Stopwatch();
        sealedWatch.Start();
        for (var i = 0; i < iterations; i++)
        {
            foo.SealedMethod();
        }
        sealedWatch.Stop();
        Console.WriteLine("sealed total {0}ms", sealedWatch.ElapsedMilliseconds);
        Console.WriteLine("per call sealed {0}ns", ((float)sealedWatch.ElapsedMilliseconds * 1000000) / iterations);

    }

}

Upvotes: 4

Answers (4)

GTRekter

Reputation: 1007

Using as reference for our test the following code, let's analyze the Microsoft intermediate language (MSIL) information generated by the compiler by using the Ildasm.exe (IL Disassembler) tool.

public sealed class Sealed
{
    public string Message { get; set; }
    public void DoStuff() { }
}
public class Derived : Base
{
    public sealed override void DoStuff() { }
}
public class Base
{
    public string Message { get; set; }
    public virtual void DoStuff() { }
}
static void Main()
{
    Sealed sealedClass = new Sealed();
    sealedClass.DoStuff();
    Derived derivedClass = new Derived();
    derivedClass.DoStuff();
    Base BaseClass = new Base();
    BaseClass.DoStuff();
}

To run this tool, open the Developer Command Prompt for Visual Studio and execute the command ildasm.

**********************************************************************
** Visual Studio 2017 Developer Command Prompt v15.9.13
** Copyright (c) 2017 Microsoft Corporation
**********************************************************************


C:\Program Files (x86)\Microsoft Visual Studio\2017\Community>ildasm

Once the application is started, load the executable (or assembly) of the previous application

No alt text provided for this image Double click on the Main method to view the Microsoft intermediate language (MSIL) information.

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       41 (0x29)
  .maxstack  8
  IL_0000:  newobj     instance void ConsoleApp1.Program/Sealed::.ctor()
  IL_0005:  callvirt   instance void ConsoleApp1.Program/Sealed::DoStuff()
  IL_000a:  newobj     instance void ConsoleApp1.Program/Derived::.ctor()
  IL_000f:  callvirt   instance void ConsoleApp1.Program/Base::DoStuff()
  IL_0014:  newobj     instance void ConsoleApp1.Program/Base::.ctor()
  IL_0019:  callvirt   instance void ConsoleApp1.Program/Base::DoStuff()
  IL_0028:  ret
} // end of method Program::Main

As you can see each class use newobj to create a new instance by pushing an object reference onto the stack and callvirt to calls a late-bound of the DoStuff() method of its respective object.

Judging on this information seems that both sealed, derived and base classes are managed in the same way by the compiler. Just to be sure, let's get deeper by analyzing the JIT-compiled code with the Disassembly window in Visual Studio.

Enable the Disassembly by selecting Enable address-level debugging, under Tools > Options > Debugging > General.

No alt text provided for this image Set the a brake point at the beginning of the application and start the debug. Once the application hits the brake-point open the Disassembly window by selecting Debug > Windows > Disassembly.

--- C:\Users\Ivan Porta\source\repos\ConsoleApp1\Program.cs --------------------
        {
0066084A  in          al,dx  
0066084B  push        edi  
0066084C  push        esi  
0066084D  push        ebx  
0066084E  sub         esp,4Ch  
00660851  lea         edi,[ebp-58h]  
00660854  mov         ecx,13h  
00660859  xor         eax,eax  
0066085B  rep stos    dword ptr es:[edi]  
0066085D  cmp         dword ptr ds:[5842F0h],0  
00660864  je          0066086B  
00660866  call        744CFAD0  
0066086B  xor         edx,edx  
0066086D  mov         dword ptr [ebp-3Ch],edx  
00660870  xor         edx,edx  
00660872  mov         dword ptr [ebp-48h],edx  
00660875  xor         edx,edx  
00660877  mov         dword ptr [ebp-44h],edx  
0066087A  xor         edx,edx  
0066087C  mov         dword ptr [ebp-40h],edx  
0066087F  nop  
            Sealed sealedClass = new Sealed();
00660880  mov         ecx,584E1Ch  
00660885  call        005730F4  
0066088A  mov         dword ptr [ebp-4Ch],eax  
0066088D  mov         ecx,dword ptr [ebp-4Ch]  
00660890  call        00660468  
00660895  mov         eax,dword ptr [ebp-4Ch]  
00660898  mov         dword ptr [ebp-3Ch],eax  
            sealedClass.DoStuff();
0066089B  mov         ecx,dword ptr [ebp-3Ch]  
0066089E  cmp         dword ptr [ecx],ecx  
006608A0  call        00660460  
006608A5  nop  
            Derived derivedClass = new Derived();
006608A6  mov         ecx,584F3Ch  
006608AB  call        005730F4  
006608B0  mov         dword ptr [ebp-50h],eax  
006608B3  mov         ecx,dword ptr [ebp-50h]  
006608B6  call        006604A8  
006608BB  mov         eax,dword ptr [ebp-50h]  
006608BE  mov         dword ptr [ebp-40h],eax  
            derivedClass.DoStuff();
006608C1  mov         ecx,dword ptr [ebp-40h]  
006608C4  mov         eax,dword ptr [ecx]  
006608C6  mov         eax,dword ptr [eax+28h]  
006608C9  call        dword ptr [eax+10h]  
006608CC  nop  
            Base BaseClass = new Base();
006608CD  mov         ecx,584EC0h  
006608D2  call        005730F4  
006608D7  mov         dword ptr [ebp-54h],eax  
006608DA  mov         ecx,dword ptr [ebp-54h]  
006608DD  call        00660490  
006608E2  mov         eax,dword ptr [ebp-54h]  
006608E5  mov         dword ptr [ebp-44h],eax  
            BaseClass.DoStuff();
006608E8  mov         ecx,dword ptr [ebp-44h]  
006608EB  mov         eax,dword ptr [ecx]  
006608ED  mov         eax,dword ptr [eax+28h]  
006608F0  call        dword ptr [eax+10h]  
006608F3  nop  
        }
0066091A  nop  
0066091B  lea         esp,[ebp-0Ch]  
0066091E  pop         ebx  
0066091F  pop         esi  
00660920  pop         edi  
00660921  pop         ebp  

00660922  ret

As we can see in the previous code, while the creation of the objects is the same, the instruction executed to invoke the methods of the sealed and derived/base class are slightly different. After moving data into registers of the RAM (mov instruction), the invoke of the sealed method, execute a comparison between dword ptr [ecx] and ecx (cmp instruction) before actually call the method.

According to the report written by Torbj¨orn Granlund, Instruction latencies and throughput for AMD and Intel x86 processors, the speed of the following instruction in a Intel Pentium 4 are:

mov: has 1 cycle as latency and the processor can sustain 2.5 instructions per cycle of this type
cmp: has 1 cycle as latency and the processor can sustain 2 instructions per cycle of this type

In conclusion, the optimization of the now days compilers and processors have made the performances between sealed and not-sealed classed basically so little that are irrelevant to the majority of the applications.

References

NewObj: https://learn.microsoft.com/en-us/dotnet/api/system.reflection.emit.opcodes.newobj?view=netframework-4.8
Callvirt: https://learn.microsoft.com/en-us/dotnet/api/system.reflection.emit.opcodes.callvirt?view=netframework-4.8
Disassembly: https://learn.microsoft.com/en-us/visualstudio/debugger/how-to-use-the-disassembly-window?view=vs-2019
x86 instruction:
https://www.aldeid.com/wiki/X86-assembly/Instructions
Instruction latencies and throughput for AMD and Intel x86 processors: https://gmplib.org/~tege/x86-timing.pdf

Upvotes: 0

Hans Passant

Reputation: 942187

You are testing the effects of memory alignment on code efficiency. The 32-bit JIT compiler has trouble generating efficient code for value types that are more than 32 bits in size, long and double in C# code. The root of the problem is the 32-bit GC heap allocator, it only promises alignment of allocated memory on addresses that are a multiple of 4. That's an issue here, you are incrementing doubles. A double is efficient only when it is aligned on an address that's a multiple of 8. Same issue with the stack, in case of local variables, it is also aligned only to 4 on a 32-bit machine.

The L1 CPU cache is internally organized in blocks called a "cache line". There is a penalty when the program reads a mis-aligned double. Especially one that straddles the end of a cache line, bytes from two cache lines have to be read and glued together. Mis-alignment isn't uncommon in the 32-bit jitter, it is merely 50-50 odds that the 'x' field happens to be allocated on an address that's a multiple of 8. If it isn't then 'x' and 'y' are going to be misaligned and one of them may well straddle the cache line. The way you wrote the test, that's going to either make VirtualMethod or SealedMethod slower. Make sure you let them use the same field to get comparable results.

The same is true for code. Swap the code for the virtual and sealed test to arbitrarily change the outcome. I had no trouble making the sealed test quite a bit faster that way. Given the modest difference in speed, you are probably looking at a code alignment issue. The x64 jitter makes an effort to insert NOPs to get a branch target aligned, the x86 jitter doesn't.

You should also run the timing test several times in a loop, at least 20. You are likely to then also observe the effect of the garbage collector moving the class object. The double may have a different alignment afterward, dramatically changing the timing. Accessing a 64-bit value type value like long or double has 3 distinct timings, aligned on 8, aligned on 4 within a cache line, and aligned on 4 across two cache lines. In fast to slow order.

The penalty is steep, reading a double that straddles a cache line is roughly three times slower than reading an aligned one. Also the core reason why a double[] (array of doubles) is allocated in the Large Object Heap even when it has only 1000 elements, well south of the normal threshold of 80KB, the LOH has an alignment guarantee of 8. These alignment problems entirely disappear in code generated by the x64 jitter, both the stack and the GC heap have an alignment of 8.

Upvotes: 4

BCS

Reputation: 78673

You might be seeing some start up cost. Try wrapping the Test-A/Test-B code in a loop and run it several times. You might also be seeing some kind of ordering effects. To avoid that (and top/bottom of loop effects), unroll it 2-3 times.

Upvotes: 1

leppie

Reputation: 117310

~~First, you have to mark the method sealed.~~

Secondly, provide an override to the virtual method. Create an instance of the derived class.

As a third test, create a sealed override method.

Now you can start comparing.

Edit: You should probably run this outside VS.

Update:

Example of what I mean.

abstract class Foo
{
  virtual void Bar() {}
}

class Baz : Foo
{
  sealed override void Bar() {}
}

class Woz : Foo
{
  override void Bar() {}
}

Now test the call speed of Bar for an instance of Baz and Woz. I also suspect member and class visibility outside the assembly could affect JIT analysis.

Upvotes: 1

How is this virtual method call faster than the sealed method call?

Answers (4)

Related Questions