Reputation: 51
I'm new to using Google Benchmark and receive different results running the same benchmark (below), which retrieves the local time using C++, when running the code locally vs on Quick-Bench.com. Both times I used GCC 8.2 and -O3.
Why do the results vary dramatically between running locally vs on quick-bench.com? Which is correct?
#include <benchmark/benchmark.h>
#include <ctime>
#include <sys/time.h>
#include <chrono>
static void BM_ctime(benchmark::State& state) {
unsigned long long count = 0;
for (auto _ : state) {
std::time_t sec = std::time(0);
benchmark::DoNotOptimize(count += sec);
}
}
BENCHMARK(BM_ctime);
static void BM_sysTime(benchmark::State& state) {
unsigned long long count = 0;
for (auto _ : state) {
unsigned long sec = time(NULL);
benchmark::DoNotOptimize(count += sec);
}
}
BENCHMARK(BM_sysTime);
static void BM_chronoMilliseconds(benchmark::State& state) {
unsigned long long count = 0;
for (auto _ : state) {
unsigned long long ms = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch()
).count();
benchmark::DoNotOptimize(count += ms);
}
}
BENCHMARK(BM_chronoMilliseconds);
static void BM_chronoSececonds(benchmark::State& state) {
unsigned long long count = 0;
for (auto _ : state) {
unsigned long long sec = std::chrono::duration_cast<std::chrono::seconds>(
std::chrono::system_clock::now().time_since_epoch()
).count();
benchmark::DoNotOptimize(count += sec);
}
}
BENCHMARK(BM_chronoSececonds);
Locally the following results are produced:
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
BM_ctime 183 ns 175 ns 4082013
BM_sysTime 197 ns 179 ns 4004829
BM_chronoMilliseconds 37 ns 36 ns 19092506
BM_chronoSececonds 37 ns 36 ns 19057991
QuickBench results:
Upvotes: 1
Views: 1564
Reputation: 382
I just run your example on my machine and I see the below result:
----------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------
BM_ctime 3.26 ns 3.25 ns 215110555
BM_sysTime 3.26 ns 3.25 ns 215154791
BM_chronoMilliseconds 2502 ns 2502 ns 279856
BM_chronoSececonds 2502 ns 2501 ns 279854
Assuming that a NOP
instruction takes 1 clock cycle, which is 0.5 ns
on my system, the ratio CPU time / NoOp time
is around 5000
.
However, I should not be really concerned because that is not what bench-marking is meant for at least for me. It doesn't make sense to compare values on my system with the values from Quick bench. Rather, I use benchmark values to compare different implementations or algorithms on the same machine, eliminating such doubts.
Upvotes: 0
Reputation: 181
Benchmark results are platform/architecture/machine dependent. It isn't even practical to assume your benchmarks will always be the same when you are running them on the same machine, things like temperature, performance scaling options, wear and tear, etc., can affect performance.
Upvotes: 1