talekeDskobeDa
talekeDskobeDa

Reputation: 382

The time shown in Google-benchmark result does not make sense

I am bench-marking some example functions on my processsor, each core running at 2 GHz. Here are the functions being bench-marked. Also, available on quick-bench

#include <stdlib.h>
#include <time.h>
#include <memory>

class Base
{
  public:       
   virtual int addNumVirt( int x ) { return (i + x); }
   int addNum( int x ) { return (x + i); }
   virtual ~Base() {}

  private:
   uint32_t i{10};
};

class Derived : public Base
{
  public:
   // Overrides of virtual functions are always virtual
   int addNumVirt( int x ) { return (x + i); }
   int addNum( int x ) { return (x + i); }

  private:
   uint32_t i{20};
};

static void BM_nonVirtualFunc(benchmark::State &state)
{
 srand(time(0));
 volatile int x = rand();
 std::unique_ptr<Derived> derived = std::make_unique<Derived>();
 for (auto _ : state)
 {
   auto result = derived->addNum( x );
   benchmark::DoNotOptimize(result);
 }
}
BENCHMARK(BM_nonVirtualFunc);

static void BM_virtualFunc(benchmark::State &state)
{
 srand(time(0));
 volatile int x = rand();
 std::unique_ptr<Base> derived = std::make_unique<Derived>();
 for (auto _ : state)
 {
   auto result = derived->addNumVirt( x );
   benchmark::DoNotOptimize(result);
 }
}
BENCHMARK(BM_virtualFunc);

static void StringCreation(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    std::string created_string("hello");
    // Make sure the variable is not optimized away by compiler
    benchmark::DoNotOptimize(created_string);
  }
}
// Register the function as a benchmark
BENCHMARK(StringCreation);

static void StringCopy(benchmark::State& state) {
  // Code before the loop is not measured
  std::string x = "hello";
  for (auto _ : state) {
    std::string copy(x);
  }
}
BENCHMARK(StringCopy);

Below are the Google-benchmark results.

Run on (64 X 2000 MHz CPU s)
CPU Caches:
  L1 Data 32K (x32)
  L1 Instruction 64K (x32)
  L2 Unified 512K (x32)
  L3 Unified 8192K (x8)
Load Average: 0.08, 0.04, 0.00
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_nonVirtualFunc      0.490 ns        0.490 ns   1000000000
BM_virtualFunc         0.858 ns        0.858 ns    825026009
StringCreation          2.74 ns         2.74 ns    253578500
BM_StringCopy           5.24 ns         5.24 ns    132874574

The results show that the execution time is 0.490 ns and 0.858 ns for the first two functions. However, what I do not understand is if my core is running at 2 GHz, this means one cycle is 0.5 ns, which makes the result seem unreasonable.

I know that the result shown is an average over the number of iterations. And such low execution time means that most of the samples are below 0.5 ns.

What am I missing?

Edit 1: From the comments, it seems like adding a constant i to x was not a good idea. In fact, I started with calling std::cout in the virtual and non-virtual functions. This helped me in understanding that virtual functions are not inlined and the call needs to be resolved at run-time.

However, having outputs in the functions being bench-marked does not look nice on the terminal. (Is there a way to share my code from Godbolt?) Can anyone propose an alternative to printing something inside the function?

Upvotes: 3

Views: 2808

Answers (1)

AlexGeorg
AlexGeorg

Reputation: 1037

Modern compilers just do magnificent things. Not always the most predictable things, but usually good things. You can see that either by watching the ASM as suggested, or by reducing the optimization level. Optim=1 makes the nonVirtualFunc equivalent to virtualFunc in terms of CPU time and optim=0 raises all your function to a similar level (Edit: of course in a bad way; do not do that to actually take performance conclusions).

And yeah, when I first used QuickBench I was confused by "DoNotOptimize" as well. They could better have called it "UseResult()" to signalize what it's actually intended to pretend when benchmarking.

Upvotes: 1

Related Questions