Kevin Meier
Kevin Meier

Reputation: 2582

Where do code execution "latencies" come from?

I have the problem that there are quite often delays in the code execution which I cannot explain. With delays I mean that execution a piece of code which should need constant time, needs sometimes much more time.

I attached a small C program which does some "dummy" calculations on the CPU core 1. The thread is pinned to this core. I've executed it on a Ubuntu 18.04 machine with 192 GiB RAM and 96 CPU cores. This machine does nothing else.

The tool only runs one thread (the main thread is sleeping) and at least the perf tool shows no switches (thread switches), so this should not be a problem.

The output of the tool looks like this (it is shown more or less every second):

...

Stats:
 Max [us]: 883
 Min [us]: 0
 Avg [us]: 0.022393

...

These statistics always show the results for 1'000'000 runs. My question is why the maximum value is always that big? Also the 99.99%-quantiles are often huge (I did not add them to the example to make the code small; the max also shows this behavior pretty well). Why does this happen and how can I avoid it? In some applications this "variance" is quite a problem for me.

Given there is nothing else running, it is hard for me to understand these values.

Thank you very much

main.c:

#define _GNU_SOURCE

#include <stdio.h>
#include <stdbool.h>
#include <sys/time.h>
#include <pthread.h>
#include <sys/sysinfo.h>

static inline unsigned long now_us()
{
    struct timeval tx;
    gettimeofday(&tx, NULL);
    return tx.tv_sec * 1000000 + tx.tv_usec;
}

static inline int calculate(int x)
{
    /* Do something "expensive" */
    for (int i = 0; i < 1000; ++i) {
        x = (~x * x + (1 - x)) ^ (13 * x);
        x += 2;
    }
    return x;
}

static void *worker(void *arg)
{
    (void)arg;

    const int runs_per_measurement = 1000000;
    int dummy = 0;
    while (true) {
        int max_us = -1;
        int min_us = -1;
        int sum_us = 0;
        for (int i = 0; i < runs_per_measurement; ++i) {
            const long start_us = now_us();
            dummy = calculate(dummy);
            const long runtime_us = now_us() - start_us;
            
            /* Update stats */
            if (max_us < runtime_us) {
                max_us = runtime_us;
            }
            if (min_us < 0 || min_us > runtime_us) {
                min_us = runtime_us;
            }
            sum_us += runtime_us;
        }
        printf("Stats:\n");
        printf(" Max [us]: %d\n", max_us);
        printf(" Min [us]: %d\n", min_us);
        printf(" Avg [us]: %f\n", (double)sum_us / runs_per_measurement);
        printf("\n");
    }

    return NULL;
}

int main()
{
    pthread_t worker_thread;

    if (pthread_create(&worker_thread, NULL, worker, NULL) != 0) {
        printf("Cannot create thread!\n");
        return 1;
    }

    /* Use CPU number 1 */
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(1, &cpuset);

    if (pthread_setaffinity_np(worker_thread, sizeof(cpuset), &cpuset) != 0) {
        printf("Cannot set cpu core!\n");
        return 1;
    }

    pthread_join(worker_thread, NULL);

    return 0;
}

Makefile:

main: main.c
    gcc -o $@ $^ -Ofast -lpthread -Wall -Wextra -Werror

Upvotes: 0

Views: 111

Answers (1)

Tristan Carlson
Tristan Carlson

Reputation: 51

This is an excellent example of how a multiprocessing works within an operating system.

As stated in the comments above:

"This machine does nothing else" --> absurd. Run ps -e to get an idea of all the other things your machine is doing. – John Bollinger

This is achieved by the Operating system (specifically the kernel) letting one task run for a bit, then pausing it and allowing another to run.

So effectively your code gets run for a bit, halted while others run, then run for a bit, and so on.

This accounts for the variation in times you see, as you are measuring time elapsed, not 'cpu-time' (time spent actually running). C has some standard functions for measuring cpu time such as this from GNU

CPU Scheduling is covered in more detail here

Finally, in order to not be Pre-empted You would need to run your code in either: Kenel-space, Bare-metal, or within a 'real-time' operating system. (I'll let you google what those terms mean :-) )

The only other solution would be to explore linux/unix 'nice values' (I'll let you google this as well, but basically it assigns a higher or lower priority to your process.)

If this sort of thing interests you there is an excellent book by Robert Love titled 'Linux Kernel Development'

Upvotes: 1

Related Questions