Optimize (recompile) inherit virtual methods for each derived class

Question

Let's say we have a "master" class with a method called "Bulk" to perform N interactions over a virtual method.

This virtual method may be overridden by many classes but only once. For performance reasons we have to minimize the cost of calling/vtable resolution as much as we can. (Example: ++10Gb network packet generation)

One of my ideas to resolve this was to make the method Bulk virtual and "somehow" force it to be recompiled on each derived class, so we could make only one VTABLE search instead of N and also get some improvements from inlining/SSE/etc. However, reading de ASM what I only get is a generic "Bulk" method that again searches in the vtable N times.

¿Do you know any way to force that method recompilation (without the need to copy-paste its code on each derived class of course) or any other way to reduce the calls ad VTABLE searches? I thought similar requirements should be asked frequently but I did not found anything...

Example code to play around:

master.hpp

#pragma once
#include 

class master
{
public:
    virtual unsigned Bulk(unsigned n)
    {
        unsigned ret = 0;
        for (int i = 0; i < 144; ++i)
            ret += once();

        return ret;
    }

    virtual unsigned once() = 0;
};

derived1.hpp

#pragma once
#include "master.hpp"

class derived1 final: public master
{
    virtual inline unsigned once() final { return 7; }
};

derived2.hpp

#pragma once
#include "master.hpp"

class derived2 final: public master
{
    virtual inline unsigned once() final { return 5; }
};

main.cpp

#include "derived1.hpp"
#include "derived2.hpp"
#include 
using namespace std;

int main()
{
    derived1 d1;
    derived2 d2;

    cout << d1.Bulk(144) << endl;
    cout << d2.Bulk(144) << endl;

    return 0;
}

Compile cmd i'm using: g++ main.cpp -S -O3 --std=gnu++17

Compiled Bulk Loop:

    movq    0(%rbp), %rax
    movq    %rbp, %rdi
    call    *8(%rax)
    addl    %eax, %r12d
    subl    $1, %ebx
    jne .L2

463035818_is_not_an_ai · Accepted Answer

I am not really understanding your question ;)

However, I suggest to avoid virtual dispatch when you want no virtual dispatch instead of trying to optimize around the virtual table (which is an implementation detail, hence optimizations wont be portable). Maybe CRTP is an option.

Just in case you want to use derivedX polymorphically, you can add a common base class:

#include 
#include 
using namespace std;

struct base {
    virtual std::string Bulk(unsigned n) = 0;
    virtual ~base(){}
};

template 
struct master : base {
    virtual std::string Bulk(unsigned n) {
        std::string ret = "";
        auto ptr = static_cast(this);
        for (int i = 0; i < n; ++i) ret += ptr->once();
        return ret;
    }
};

struct derived1 final : public master {
    std::string once() { return "a"; }
};

struct derived2 final : public master {
    std::string once() { return "b"; }
};

int main()
{
    derived1 d1;
    derived2 d2;

    cout << d1.Bulk(3) << endl;
    cout << d2.Bulk(3) << endl;
}