rizwanhudda
rizwanhudda

Reputation: 207

measuring precise time in nanoseconds C++

I wanted to test a way to measure the precise execution time of a piece of code in nanoseconds (accuracy upto 100 nanoseconds is ok) in C++.

I tried using chrono::high_resolution_clock for this purpose. In order to test whether it is working properly or not. I do the following:

  1. Get current time in nanoseconds using high_resolution_clock, call it "start"
  2. sleep for "x" nanoseconds using nanosleep(x)
  3. Get current time in nanoseconds using high_resolution_clock, call it "end"
  4. Now "end" - "start" should be roughly same as "x". Lets call this difference "diff"

I ran the above test for x varying from 10 to 1000000. I get the diff to be around 100000 i.e (100 microseconds)

Where as this shouldn't be more than say 100 nanoseconds. Please help me fix this.

#include <ctime>
#include <unistd.h>
#include <iostream>
#include <chrono>

using namespace std;

int main() {
    int sleep_ns[] = {10, 50, 100, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000};
    int n = sizeof(sleep_ns)/sizeof(int);
    for (int i = 0; i < n; i++) {
        auto start = std::chrono::high_resolution_clock::now();
        timespec tspec = {0, sleep_ns[i]};
        nanosleep(&tspec, NULL);
        auto end = std::chrono::high_resolution_clock::now();
        chrono::duration<int64_t, nano> dur_ns = (end - start);
        int64_t measured_ns = dur_ns.count();
        int64_t diff = measured_ns - sleep_ns[i];
        cout << "diff: " << diff
             << " sleep_ns: " << sleep_ns[i]
             << " measured_ns: " << measured_ns << endl;
    }
    return 0;
}

Following was the output of this code on my machine. Its running "Ubuntu 16.04.4 LTS"

diff: 172747 sleep_ns: 10 measured_ns: 172757
diff: 165078 sleep_ns: 50 measured_ns: 165128
diff: 164669 sleep_ns: 100 measured_ns: 164769
diff: 163855 sleep_ns: 500 measured_ns: 164355
diff: 163647 sleep_ns: 1000 measured_ns: 164647
diff: 162207 sleep_ns: 2000 measured_ns: 164207
diff: 160904 sleep_ns: 5000 measured_ns: 165904
diff: 155709 sleep_ns: 10000 measured_ns: 165709
diff: 145306 sleep_ns: 20000 measured_ns: 165306
diff: 115915 sleep_ns: 50000 measured_ns: 165915
diff: 125983 sleep_ns: 100000 measured_ns: 225983
diff: 115470 sleep_ns: 200000 measured_ns: 315470
diff: 115774 sleep_ns: 500000 measured_ns: 615774
diff: 116473 sleep_ns: 1000000 measured_ns: 1116473

Upvotes: 1

Views: 2766

Answers (2)

Jerry Coffin
Jerry Coffin

Reputation: 490048

Here's part of the description of nanosleep:

If the interval specified in req is not an exact multiple of the granularity underlying clock (see time(7)), then the interval will be rounded up to the next multiple. Furthermore, after the sleep completes, there may still be a delay before the CPU becomes free to once again execute the calling thread.

The behavior you're getting seems to fit pretty well with the description.

For extremely short pauses, you're probably going to have to do some (most?) of the work on your own. The system clock source will often have a granularity of a microsecond or so.

One possible way to pause for less than the system clock time would be to measure how often you can execute a loop before the clock changes. During startup (for example) do that a few times, to get a good idea of how many loops you can execute per microsecond.

Then to pause for some fraction of that time, you can do linear interpolation to guess at a number of times to execute a loop to get about the same length of pause.

Note: this will generally run the CPU at 100% for the duration of the pause, so you only want to do it for really short pauses--up to a microsecond or two is fine, but if you want much more than that, you probably want to fall back to nanosleep.

Even with that, however, you need to be aware that a pause could end up substantially longer than you planned. The OS does time slicing. If your process' time slice expires in the middle of your pause loop, it could easily be tens of milliseconds (or more) before it's scheduled to run again.

If you really need an assurance of response times on this order, you'll probably need to consider another OS (but even that's not a panacea--what you're asking for isn't trivial, regardless of how you approach it).

Reference

nanosleep man page

Upvotes: 1

Xirema
Xirema

Reputation: 20386

What you're trying to do is not going to work on every platform, or even most platforms. There's a couple of reasons why.

The first, and biggest, reason is that measuring the precise time that code is executed at/within is, by its very nature, imprecise. It requires a black-box OS call to determine, and if you've ever looked at how those calls are implemented in the first place, it's quickly apparent that there's inherent imprecision in the technique. On Windows, this is done by measuring both the current "tick" of the processor, and its reported frequency, and multiplying one by the other to determine how many nanoseconds have passed between two successive calls. But Windows only reports with accuracy of Microseconds to begin with, and if the CPU changes its frequency, even if only modestly (which is common in modern CPUs, to lower the frequency when the CPU isn't being maxed out, to save power) that can skew results.

Linux also has similar quirks, and every OS is at the mercy of the CPU's ability to accurately report its own tick counter/tick rate.

The second reason you're going to get results like what you've observed is in the fact that, for similar reasons to the first reason, "Sleep"-ing a thread is usually very imprecise. CPUs usually can't sleep with better precision than microsecond precision, and it's often not possible to sleep any faster than half a millisecond at a time. Your particular environment seems to be at least capable of a few hundred microseconds of precision, but it's clearly not more precise than that. Some environments will even drop Nanosecond resolution altogether.

Altogether, it's probably a mistake to presume that, without programming for an explicitly Real-Time OS, using the specific API of that OS, that you can get the kind of precision you're expecting/desiring. If you want reliable information about the timing of individual snippets of code, you'll need to run said code over-and-over, get a sample of the entire execution, and then take the average, to get you a broad idea of the timing for each run. It'll still be imprecise, but it'll help get around these limitations.

Upvotes: 3

Related Questions