Juster
Juster

Reputation: 752

Two consequent std::chrono::high_resolution_clock::now() gives ~270ns difference

I want to measure duration of a piece of code with a std::chrono clock, but it seems too heavy to measure something that lasts nanoseconds. That program:

#include <cstdio>
#include <chrono>

int main() {
    using clock = std::chrono::high_resolution_clock;

    // try several times
    for (int i = 0; i < 5; i++) {
        // two consequent now() here, one right after another without anything in between
        printf("%dns\n", (int)std::chrono::duration_cast<std::chrono::nanoseconds>(clock::now() - clock::now()).count());
    }
    return 0;
}

Always gives me around 100-300ns. Is this because of two syscalls? Is it possible to have less duration between two now()? Thanks!

Environment: Linux Ubuntu 18.04, kernel 4.18, load average is low, stdlib is linked dynamically.

Upvotes: 0

Views: 2725

Answers (3)

Maxim Egorushkin
Maxim Egorushkin

Reputation: 136515

Use rdtsc instruction to measure times with the highest resolution and the least overhead possible:

#include <iostream>
#include <cstdint>

int main() {
    uint64_t a = __builtin_ia32_rdtsc();
    uint64_t b = __builtin_ia32_rdtsc();
    std::cout << b - a << " cpu cycles\n";
}

Output:

19 cpu cycles

To convert the cycles to nanoseconds divide cycles by the base CPU frequency in GHz. For example, for a 4.2GHz i7-7700k divide by 4.2.

TSC is a global counter in the CPU shared across all cores.

Modern CPUs have a constant TSC that ticks at the same rate regardless of the current CPU frequency and boost. Look for constant_tsc in /proc/cpuinfo, flags field.

Also note, that __builtin_ia32_rdtsc is more effective than the inline assembly, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48877

Upvotes: 5

SergeyA
SergeyA

Reputation: 62613

Just do not use time clocks for nanoseconds benchmark. Instead, use CPU ticks - on any hardware modern enough to worry about nanoseconds, CPU ticks are monotonic, steady and synchronized between cores.

Unfortunately, C++ does not expose CPU tick clock, so you'd have to use RDTSC instruction directly (it can be nicely wrapped in the inline function or you can use compiler's intrinsics). The difference in CPU ticks could also be converted into time if you so desire (by using CPU frequency), but normally for such a low-latency benchmarks it is not necessary.

Upvotes: 2

AlbertM
AlbertM

Reputation: 1296

If you want to measure the duration of very fast code snippets it is generally a good idea to run them multiple times and take the average time of all runs, the ~200ns that you mention will be negligible then because they are distributed over all runs.

Example:

#include <cstdio>
#include <chrono>
using clock = std::chrono::high_resolution_clock;

auto start = clock::now();
int n = 10000; // adjust depending on the expected runtime of your code
for (unsigned int i = 0; i < n; ++i)
    functionYouWantToTime();
auto result =
    std::chrono::duration_cast<std::chrono::nanoseconds>(start - clock::now()).count() / n;

Upvotes: 2

Related Questions