Performance differences in run-time polymorphism between C and C++

Question

I know benchmarking is a very delicate subject and simple, not-well-thought-out benchmarks are mostly meaningless for performance comparisons, but what I have right now is actually a pretty small and contrived example that I think should be easily explainable. So, even if the question seems unhelpful, it would at least help me in understanding benchmarking.

So, here I go.

I was trying to experiment with simple API design in C, using run-time polymorphism kind of behaviour via void *. Then I compared it with same thing implemented in C++ using regular virtual functions. Here is the code:

#include 
#include 
#include 

int dummy_computation()
{
    return 64 / 8;
}

/* animal library, everything is prefixed with al for namespacing */
#define AL_SUCCESS 0;
#define AL_UNKNOWN_ANIMAL 1;
#define AL_IS_TYPE_OF(animal, type) \
    strcmp(((type *)animal)->animal_type, #type) == 0\

typedef struct {
    const char* animal_type;
    const char* name;
    const char* sound;
} al_dog;

inline int make_dog(al_dog** d) {
    *d = (al_dog*) malloc(sizeof(al_dog));
    (*d)->animal_type = "al_dog";
    (*d)->name = "leslie";
    (*d)->sound = "bark";
    return AL_SUCCESS;
}

inline int free_dog(al_dog* d) {
    free(d);
    return AL_SUCCESS;
}
    
typedef struct {
    const char* animal_type;
    const char* name;
    const char* sound;
} al_cat;

inline int make_cat(al_cat** c) {
    *c = (al_cat*) malloc(sizeof(al_cat));
    (*c)->animal_type = "al_cat";
    (*c)->name = "garfield";
    (*c)->sound = "meow";
    return AL_SUCCESS;
}

inline int free_cat(al_cat* c) {
    free(c);
    return AL_SUCCESS;
}

int make_sound(void* animal) {
    if(AL_IS_TYPE_OF(animal, al_cat)) {
        al_cat *c = (al_cat*) animal;
        return dummy_computation();
    } else if(AL_IS_TYPE_OF(animal, al_dog)) {
        al_dog *d = (al_dog*) animal;
        return dummy_computation();
    } else {
        printf("unknown animal
");
        return 0;
    }
}
/* c style library finishes here */

/* cpp library with OOP */
struct animal {
    animal(const char* n, const char* s) 
    :name(n)
    ,sound(s)
    {} 
    virtual int make_sound() {
        return dummy_computation();
    }
    const char* name;
    const char* sound;
};

struct cat : animal {
    cat() 
    :animal("garfield", "meow")
    {}
};

struct dog : animal {
    dog() 
    :animal("leslie", "bark")
    {}
};
/* cpp library finishes here */

I have something called dummy_computation, just to make sure I get some computational thingy going on in the benchmark. I would normally implement different printf calls for barking, meowing etc. for such an example but printf is not easily benchmarkable in quick-benchmarks.com. The actual thing I want to benchmark is run-time polymorphism implementation. So that's why I chose to make some small function and used it in both C and C++ implementation as a filler.

Now, in quick-benchmarks.com, I have a benchmark like following:

static void c_style(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    al_dog* d = NULL;
    al_cat* c = NULL;

    make_dog(&d);
    make_cat(&c);
    
    int i1 = make_sound(d);
    benchmark::DoNotOptimize(i1);
    int i2 = make_sound(c);
    benchmark::DoNotOptimize(i2);

    free_dog(d);
    free_cat(c);
  }
}
// Register the function as a benchmark
BENCHMARK(c_style);

static void cpp_style(benchmark::State& state) {
  for (auto _ : state) {
    animal* a1 = new dog();
    animal* a2 = new cat();
    int i1 = a1->make_sound();
    benchmark::DoNotOptimize(i1);
    int i2 = a2->make_sound();
    benchmark::DoNotOptimize(i2);
    delete a1;
    delete a2;
  }
}
BENCHMARK(cpp_style);

I added DoNotOptimize calls so that virtual calls would not end up being optimized-out.

Whole benchmark can be found here, if recreating it seems painful.

https://quick-bench.com/q/ezul9hDXTjfSWijCfd2LMUUEH1I

Now, to my surprise, C version comes out 27 times faster in the results. I expected maybe some performance hits on C++ version because it is a more refined solution but definitely not 27-fold.

Can someone explain these results? Do virtual function calls really incur this much overhead compared to C? Or is it the way I set up this benchmarking experiment that is completely meaningless? If so, how would one more correctly benchmark such issues?

Performance differences in run-time polymorphism between C and C++

Answers (1)

Related Questions