How to measure difference in performance between virtual function+inheritance and std::function member w/o inheritance?

Question

tl;dr

Is the benchmark I present below a fair way to compare inheritance-based vs std::function-based approach to polymorphism?

Full question

If one needs different objects that implement the same interface in different ways, and also needs to be able to put them under a container and swap one with another at run-time, the most popular solution is to use inheritance:

struct Base {
    virtual void f() = 0;
    virtual ~Base() = default;
};
struct Derived1 : Base {
    virtual void f();
};
struct Derived2 : Base {
    virtual void f();
};

Another solution is to have a single class, but swap the virtual method for a std::function:

struct Foo {
    std::function f{};
};
auto foo1 = Foo{[]{ return /* impl like Derived1 */; }};
auto foo2 = Foo{[]{ return /* impl like Derived2 */; }};

(Some questions about the difference between the two approaches are here, here, and here.)

However, regardless of other pros and cons of either solution, I'm curious to measure the difference in performance with a benchmark.

I understand that the performance will obviously vary based on how std::function is implemented as well as on the compiler and the options passed to it, the operating system, and who knows what else.

But with all these factors being held fixed, I think one can measure the difference in performance, if it exists at all.

I should clarify that my intention is see first-hand that indeed the difference between the two approaches is to be considered negligible unless in very peculiar usecases, as I've understood from the linked questions and other sources. Or to prove that my understanding is wrong and there is indeed an important difference in performance.

My attempt to write a benchmark is here:

A few explanations about various bits of it:

All fs above alter in different ways a global unsigned int,
```
unsigned int RETURN{};
```
which I return from main, so to make sure that the body of those functions cannot be optimized away;

I've changed Derived1::f/Derived2::f and foo1/foo2's lambda bodies (with respect to the snippets above) in a way that they alter the aforementioned global unsigned int:

struct Base {
    virtual void f() = 0;
    virtual ~Base() = default;
};
struct Derived1 : Base {
    virtual void f() { RETURN += 1; }
};
struct Derived2 : Base {
    virtual void f() { RETURN += 2; }
};

struct Foo {
    std::function f{};
};
auto const foo1 = Foo{[]{ RETURN += 1; }};
auto const foo2 = Foo{[]{ RETURN += 2; }};

Before the measurement code, I generate random bools that I use to randomly pick between Derived1/foo1 and Derived2/foo2

std::random_device rd;
std::mt19937 gen{rd()};
std::bernoulli_distribution randBool{0.5};
constexpr int N = 1000000;

std::array bools;
for (bool& b : bools) {
    b = randBool(gen);
}

I use Boost.Hana to conveniently loop on tuple of 2 compile-time true/false which allow parametrising over the two cases, and Range-v3 to conveniently accumulate the time measurement performed for each call of the virtual function/std::function:

using Time = duration;

std::array times; // 0: std::function-based, 1: inheritance-based

hana::for_each(hana::make_basic_tuple(hana::false_c, hana::true_c), [&](auto hb) {
    constexpr bool B = hb;
    auto const elapsed = ranges::accumulate(bools, Time{}, [](auto acc, auto b){
        /* time measurement */;
    });
    times[!B] = elapsed;
});

The function to pick based on that run-time boolean b is the following, which is also templated on the compile-time boolean B used to pick between the two cases being compared:

template
constexpr auto bool2Obj = []{
    if constexpr (B) {
        return [](bool b){
            return b
                ? foo1
                : foo2;
        };
    } else {
        using BasePtr = std::unique_ptr;
        return [](bool b){
            return b
                ? BasePtr{std::make_unique()}
                : BasePtr{std::make_unique()};
        };
    }
}();

Once the object is picked, its method is called via the following, which is templated on bool B for the same reason as above, i.e. allowing picking each of the two cases being compared:

template
constexpr auto call = []{
    if constexpr (B) {
        return [](Foo const& p){ p.f(); };
    } else {
        return [](std::unique_ptr const& p){ p->f(); };
    }
}();

The /* time measurement */ is the following:

auto obj = bool2Obj(b);
auto const start = high_resolution_clock::now();
call(obj);
auto const end = high_resolution_clock::now() - start;
return acc + Time{end};

where I've kept the random-picking of the object out of the measurement, leaving in the measurement only the call.

The result,

with the few repetions that I can run before the processes are killed for timeout, CompilerExplorer seems to tell me that the two approaches have roughly the same peformance, as the percentage I'm printing, which is (i - f) / i (where i and f are the runtimes of inheritance-based vs function-based approaches), changes often sign; this is the case with both Clang and GCC

however, executing the program on my machine, while GCC seem to behave similarly, Clang (18.1.8) consistently returns positive results like the following, suggesting that the std::function-based approach is faster:
0.0648057 0.0716398 0.0636759 0.0649676 0.0673908 0.0756509 0.0780861 0.0890416 0.090532 0.094767

on a third hand, QuickBench (for which I had to ditch Boost and Range-v3), seems to consistently support that the std::function-based approach is faster, for GCC, Clang + LLVM, Clang + GNU.

How to measure difference in performance between virtual function+inheritance and std::function member w/o inheritance?

tl;dr

Full question

Answers (0)

Related Questions