Reputation: 95
We are trying to implement a program that sends commands to a robot in a given cycle time. Thus this program should be a real-time application. We set up a pc with a preempted RT Linux kernel and are launching our programs with chrt -f 98 or chrt -rr 99 to define the scheduling policy and priority. Loading of the kernel and launching of the program seems to be fine and work (see details below).
Now we were measuring the time (CPU ticks) it takes our program to be computed. We expected this time to be constant with very little variation. What we measured though, were quite significant differences in computation time. Of course, we thought this could be undefined behavior in our rather complex program, so we created a very basic program and measured the time as well. The behavior was similarly bad.
First of all, we installed an RT Linux Kernel on the PC using this tutorial. The main characteristics of the PC are:
PC Characteristics | Details |
---|---|
CPU | Intel(R) Atom(TM) Processor E3950 @ 1.60GHz with 4 cores |
Memory RAM | 8 GB |
Operating System | Ubunut 20.04.1 LTS |
Kernel | Linux 5.9.1-rt20 SMP PREEMPT_RT |
Architecture | x86-64 |
The first time we detected this problem was when we were measuring the time it takes to execute this "complex" program with a single thread. We did a few tests with this program but also with a simpler one:
We also did a latency test on the PC.
For this one, we followed this tutorial, and these are the results:
The processes are shown in htop with a priority of RT
We called the function multiple times in the program and measured the time each takes. The results of the 2 tests are:
From this we observed that:
We did the same test but this time with a simpler program:
#include <vector>
#include <iostream>
#include <time.h>
int main(int argc, char** argv) {
int iterations = 5000;
double a = 5.5;
double b = 5.5;
double c = 4.5;
std::vector<double> wallTime(iterations, 0);
std::vector<double> cpuTime(iterations, 0);
struct timespec beginWallTime, endWallTime, beginCPUTime, endCPUTime;
std::cout << "Iteration | WallTime | cpuTime" << std::endl;
for (unsigned int i = 0; i < iterations; i++) {
// Start measuring time
clock_gettime(CLOCK_REALTIME, &beginWallTime);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &beginCPUTime);
// Function
a = b + c + i;
// Stop measuring time and calculate the elapsed time
clock_gettime(CLOCK_REALTIME, &endWallTime);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endCPUTime);
wallTime[i] = (endWallTime.tv_sec - beginWallTime.tv_sec) + (endWallTime.tv_nsec - beginWallTime.tv_nsec)*1e-9;
cpuTime[i] = (endCPUTime.tv_sec - beginCPUTime.tv_sec) + (endCPUTime.tv_nsec - beginCPUTime.tv_nsec)*1e-9;
std::cout << i << " | " << wallTime[i] << " | " << cpuTime[i] << std::endl;
}
return 0;
}
We understand that:
Of course, we can give more details.
Thanks a lot for your help!
Upvotes: 0
Views: 312
Reputation: 12362
Your function will near certainly be optimized away so you are just measuring how long it takes to read the clocks. And as you can see that doesn't take very long with some exceptions:
The very first time you run the code (unless you just compiled it) the pages need to be loaded from disk. If you are unlucky the code spans pages and you include the loading of the next page in the measured time. Quite unlikely given the code size.
The first loop the code and any data needs to be loaded into cache. So that takes longer to execute. The branch predictor might also need a few loops to predict the loop right so the second, third loop might be slightly longer too.
For everything else I think you can blame scheduling:
You can do little about IRQs. Some you can fix to specific cores but others are just essential (like the timer interrupt for the scheduler itself). You kind of just have to live with that.
But you can fix your program to a specific CPU and you can fix everything else to all the other cores. Basically reserving the core for the real-time code. I guess you would have to use cgroups for this, to keep everything else off the chosen core. And you might still get some kernel threads run on the reserved core. Nothing you can do about that. But that should eliminate most of the large execution times.
Upvotes: 1