Ruup
Ruup

Reputation: 177

Optimization of virtual table lookups

With code like below, can a compiler tell that a is in fact an instance of B and optimize away the virtual table lookup?

#include <iostream>

class A
{
  public:
    virtual void f()
    {
        std::cout << "A::f()" << std::endl;
    }
};

class B : public A
{
  public:
    void f()
    {
        std::cout << "B::f()" << std::endl;
    }
};

int main()
{
    B b;
    A* a = &b;
    a->f();

    return 0;
}

Additional question after the answers of Jonthan Seng and reima: In case gcc is used, would it be necessary to use any flags to force it to optimize the vtable lookup?

Upvotes: 5

Views: 1702

Answers (2)

reima
reima

Reputation: 2126

Clang can easily make this optimization, and even inlines the function call. This can be seen from the generated assembly:

Dump of assembler code for function main():
   0x0000000000400500 <+0>: push   %rbp
   0x0000000000400501 <+1>: mov    %rsp,%rbp
   0x0000000000400504 <+4>: mov    $0x40060c,%edi
   0x0000000000400509 <+9>: xor    %al,%al
   0x000000000040050b <+11>:  callq  0x4003f0 <printf@plt>
   0x0000000000400510 <+16>:  xor    %eax,%eax
   0x0000000000400512 <+18>:  pop    %rbp
   0x0000000000400513 <+19>:  retq   

I took the liberty of replacing std::cout << … by equivalent calls to printf, as this greatly reduces the clutter in the disassembly.

GCC 4.6 can also deduce that no vtable lookup is needed, but does not inline:

Dump of assembler code for function main():
   0x0000000000400560 <+0>: sub    $0x18,%rsp
   0x0000000000400564 <+4>: mov    %rsp,%rdi
   0x0000000000400567 <+7>: movq   $0x4007c0,(%rsp)
   0x000000000040056f <+15>:  callq  0x400680 <B::f()>
   0x0000000000400574 <+20>:  xor    %eax,%eax
   0x0000000000400576 <+22>:  add    $0x18,%rsp
   0x000000000040057a <+26>:  retq   

Upvotes: 7

Jonathan Seng
Jonathan Seng

Reputation: 1219

Maybe it can -- that would depend on the smarts of the compiler and the optimization requirements.

But, this is one call. Why do you care about optimizing this one call? And, if you do care, why not just get the type right for this one call?

The first answer to all questions about optimization is, "Why do you need to optimize that?" Have a performance tool report saying 50% of the application time is one place, and the question is answered. "Oh, but its inefficient," the most common answer, leads to unmaintainable code that rarely optimizes the code that is actually inefficient.

Upvotes: -2

Related Questions