Reputation: 752
I want to measure duration of a piece of code with a std::chrono
clock, but it seems too heavy to measure something that lasts nanoseconds. That program:
#include <cstdio>
#include <chrono>
int main() {
using clock = std::chrono::high_resolution_clock;
// try several times
for (int i = 0; i < 5; i++) {
// two consequent now() here, one right after another without anything in between
printf("%dns\n", (int)std::chrono::duration_cast<std::chrono::nanoseconds>(clock::now() - clock::now()).count());
}
return 0;
}
Always gives me around 100-300ns. Is this because of two syscalls? Is it possible to have less duration between two now()? Thanks!
Environment: Linux Ubuntu 18.04, kernel 4.18, load average is low, stdlib is linked dynamically.
Upvotes: 0
Views: 2725
Reputation: 136515
Use rdtsc
instruction to measure times with the highest resolution and the least overhead possible:
#include <iostream>
#include <cstdint>
int main() {
uint64_t a = __builtin_ia32_rdtsc();
uint64_t b = __builtin_ia32_rdtsc();
std::cout << b - a << " cpu cycles\n";
}
Output:
19 cpu cycles
To convert the cycles to nanoseconds divide cycles by the base CPU frequency in GHz. For example, for a 4.2GHz i7-7700k divide by 4.2.
TSC is a global counter in the CPU shared across all cores.
Modern CPUs have a constant TSC that ticks at the same rate regardless of the current CPU frequency and boost. Look for constant_tsc
in /proc/cpuinfo
, flags
field.
Also note, that __builtin_ia32_rdtsc
is more effective than the inline assembly, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48877
Upvotes: 5
Reputation: 62613
Just do not use time clocks for nanoseconds benchmark. Instead, use CPU ticks - on any hardware modern enough to worry about nanoseconds, CPU ticks are monotonic, steady and synchronized between cores.
Unfortunately, C++ does not expose CPU tick clock, so you'd have to use RDTSC instruction directly (it can be nicely wrapped in the inline function or you can use compiler's intrinsics). The difference in CPU ticks could also be converted into time if you so desire (by using CPU frequency), but normally for such a low-latency benchmarks it is not necessary.
Upvotes: 2
Reputation: 1296
If you want to measure the duration of very fast code snippets it is generally a good idea to run them multiple times and take the average time of all runs, the ~200ns that you mention will be negligible then because they are distributed over all runs.
Example:
#include <cstdio>
#include <chrono>
using clock = std::chrono::high_resolution_clock;
auto start = clock::now();
int n = 10000; // adjust depending on the expected runtime of your code
for (unsigned int i = 0; i < n; ++i)
functionYouWantToTime();
auto result =
std::chrono::duration_cast<std::chrono::nanoseconds>(start - clock::now()).count() / n;
Upvotes: 2