Reputation: 541

Is clock() in c++ consistent with heavy CPU loads

Right now I basically have a program that uses clock to test the amount of time my program takes to do certain operations and usually it is accurate to a couple milliseconds. My question is this: If the CPU is under heavy load will I still get the same results?

Does clock only count when the CPU is working on my process?

Lets assume: multi-core CPU but a process that does not take advantage of multithreading

Upvotes: 3

Answers (3)

myaut

Reputation: 11514

There are shared components in CPU like last level cache, execution units (between hardware threads within one core), so under heavy loads you will get jitter, because even if your application executed exactly the same amount of instructions, each instructions may take more cycles (waiting for memory because data was evicted from cache, available execution unit), and more cycles means more time to execute (assuming that Turbo Boost won't compensate).

If you seek for precise instrument, look at hardware counters.

It is also important to consider factors like the number of cores available on the physical CPU, hyper-threading and other BIOS settings like Turbo Boost on Intel CPUs, and threading techniques used when coding when looking at timing metrics for CPU intensive tasks.

Parallelization tools like OpenMP provide built-in functions for calculating computation and wall time like omp_get_wtime( ); which are often times more accurate than clock() in programs that make use of this type of parallelization.

Upvotes: 2

Mats Petersson

Reputation: 129524

The function of clock depends on the OS. In windows, from a long distant decision, clock gives the elapsed time, in most other OS's (certainly Linux, MacOS and other Unix-related OS's).

Depending on what you actually want to achieve, elapsed time or CPU time may be what you want to measure.

In a system where there are other processes running, the difference between elapsed time and CPU usage may be huge (and of course, if your CPU is NOT busy running your application, e.g. waiting for network packets to go down the wire or file-data from the hard-disk), then elapsed time is "avialable" for other applications.

There are also a huge number of error factors/interference factors when there are other processes running in the same system:

If we assume that your OS supports clock as a measure of CPU-time, the precision here is not always that great - for example, it may well be accounted in terms of CPU-timer ticks, and your process may not run for the "full tick" if it's doing I/O for example.
Other processes may use "your" cpu for parts of the interrupt handling, before the OS has switched to "account for this as interrupt time", when dealing with packets over the network or hard-disk i/o for some percentage of the time [typically not huge amounts, but in a very busy system, it can be several percent of the total time], and if other processes run on "your" cpu, the time to reload the cache(s) with "your" process' data after the other process loaded it's data will be accounted on "your time". This sort of "interference" may very well affect your measurements - how much very much depends on "what else" is going on in the system.
If your process shares data [via shared memory] with another process, there will also be (again, typically a minute amount, but in extreme cases, it can be significant) some time spent on dealing with "cache-snoop requests" between your process and the other process, when your process doesn't get to execute.
If the OS is switching tasks, "half" of the time spent switching to/from your task will accounted in your process, and half in the other process being switched in/out. Again, this is usually tiny amounts, but if you have a very busy system with lots of process switches, it can add up.
Some processor types, e.g. Intel's HyperThreading also share resources with your actual core, so only SOME of the time on that core is spent in your process, and the cache-content of your process is now shared with some other process' data and instructions - meaning your process MAY get "evicted" from the cache by the other thread running on the same CPU-core.
Likewise, multicore CPU's often have a shared L3 cache that gets affected by other processes running on the other cores of the CPU.
File-caching and other "system caches" will also be affected by other processes - so if your process is reading some file(s), and other processes also access file(s), the cache-content will be "less yours" than if the system wasn't so busy.

For accurate measurements of how much your process uses of the system resources, you need processor performance counters (and a reproducable test-case, because you probably need to run the same setup several times to ensure that you get the "right" combination of performance counters). Of course, most of these counters are ALSO system-wide, and some kinds of processing in for example interrupts and other random interference will affect the measurement, so the most accurate results will be if you DON'T have many other (busy) processes running in the system.

Of course, in MANY cases, just measuring the overall time of your application is perfectly adequate. Again, as long as you have a reproducable test-case that gives the same (or at least similar) timing each time it's run in a particular scenario.

Each application is different, each system is different. Performance measurement is a HUGE subject, and it's very hard to cover EVERYTHING - and of course, we're not here to answer extremely specific questions about "how do I get my PI-with-a-million-decimals to run faster when there are other processes running in the same system" or whatever it may be.

Upvotes: 4

Escualo

Reputation: 42182

In addition to agreeing with the responses indicating that timings depend on many factors, I would like to bring to your attention the std::chrono library available since C++11:

#include <chrono>
#include <iostream>

int main() {
      auto beg = std::chrono::high_resolution_clock::now();
      std::cout << "*** Displaying Some Stuff ***" << std::endl;
      auto end = std::chrono::high_resolution_clock::now();
      auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end - beg);
      std::cout << "Elapsed: " << dur.count() << " microseconds" << std::endl;
    }

As per the standard, this program will utilize the highest-precision clock provided by your system and will tick with microsecond resolution (there are other resolutions available; see the docs).

Sample run:

$ g++ example.cpp -std=c++14 -Wall -Wextra -O3
$ ./a.out
*** Displaying Some Stuff ***
Elapsed: 29 microseconds

While it is much more verbose than relying on the C-style std::clock(), I feel it gives you much more expressiveness, and you can hide the verbosity behind a nice interface (for example, see my answer to a previous post where I use std::chrono to build a function timer).

Upvotes: 4

Is clock() in c++ consistent with heavy CPU loads

Answers (3)

Related Questions