Reputation: 2582
I have the problem that there are quite often delays in the code execution which I cannot explain. With delays I mean that execution a piece of code which should need constant time, needs sometimes much more time.
I attached a small C program which does some "dummy" calculations on the CPU core 1. The thread is pinned to this core. I've executed it on a Ubuntu 18.04 machine with 192 GiB RAM and 96 CPU cores. This machine does nothing else.
The tool only runs one thread (the main thread is sleeping) and at least the perf
tool shows no switches (thread switches), so this should not be a problem.
The output of the tool looks like this (it is shown more or less every second):
...
Stats:
Max [us]: 883
Min [us]: 0
Avg [us]: 0.022393
...
These statistics always show the results for 1'000'000 runs. My question is why the maximum value is always that big? Also the 99.99%-quantiles are often huge (I did not add them to the example to make the code small; the max also shows this behavior pretty well). Why does this happen and how can I avoid it? In some applications this "variance" is quite a problem for me.
Given there is nothing else running, it is hard for me to understand these values.
Thank you very much
main.c:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdbool.h>
#include <sys/time.h>
#include <pthread.h>
#include <sys/sysinfo.h>
static inline unsigned long now_us()
{
struct timeval tx;
gettimeofday(&tx, NULL);
return tx.tv_sec * 1000000 + tx.tv_usec;
}
static inline int calculate(int x)
{
/* Do something "expensive" */
for (int i = 0; i < 1000; ++i) {
x = (~x * x + (1 - x)) ^ (13 * x);
x += 2;
}
return x;
}
static void *worker(void *arg)
{
(void)arg;
const int runs_per_measurement = 1000000;
int dummy = 0;
while (true) {
int max_us = -1;
int min_us = -1;
int sum_us = 0;
for (int i = 0; i < runs_per_measurement; ++i) {
const long start_us = now_us();
dummy = calculate(dummy);
const long runtime_us = now_us() - start_us;
/* Update stats */
if (max_us < runtime_us) {
max_us = runtime_us;
}
if (min_us < 0 || min_us > runtime_us) {
min_us = runtime_us;
}
sum_us += runtime_us;
}
printf("Stats:\n");
printf(" Max [us]: %d\n", max_us);
printf(" Min [us]: %d\n", min_us);
printf(" Avg [us]: %f\n", (double)sum_us / runs_per_measurement);
printf("\n");
}
return NULL;
}
int main()
{
pthread_t worker_thread;
if (pthread_create(&worker_thread, NULL, worker, NULL) != 0) {
printf("Cannot create thread!\n");
return 1;
}
/* Use CPU number 1 */
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);
if (pthread_setaffinity_np(worker_thread, sizeof(cpuset), &cpuset) != 0) {
printf("Cannot set cpu core!\n");
return 1;
}
pthread_join(worker_thread, NULL);
return 0;
}
Makefile:
main: main.c
gcc -o $@ $^ -Ofast -lpthread -Wall -Wextra -Werror
Upvotes: 0
Views: 111
Reputation: 51
This is an excellent example of how a multiprocessing works within an operating system.
As stated in the comments above:
"This machine does nothing else" --> absurd. Run ps -e to get an idea of all the other things your machine is doing. – John Bollinger
This is achieved by the Operating system (specifically the kernel) letting one task run for a bit, then pausing it and allowing another to run.
So effectively your code gets run for a bit, halted while others run, then run for a bit, and so on.
This accounts for the variation in times you see, as you are measuring time elapsed, not 'cpu-time' (time spent actually running). C has some standard functions for measuring cpu time such as this from GNU
CPU Scheduling is covered in more detail here
Finally, in order to not be Pre-empted You would need to run your code in either: Kenel-space, Bare-metal, or within a 'real-time' operating system. (I'll let you google what those terms mean :-) )
The only other solution would be to explore linux/unix 'nice values' (I'll let you google this as well, but basically it assigns a higher or lower priority to your process.)
If this sort of thing interests you there is an excellent book by Robert Love titled 'Linux Kernel Development'
Upvotes: 1