Reputation: 179
I have a C program which creates two threads (apart from main), T1 and T2. T1 executes a function which issues an operation O1 and T2 executes a function which issues an operation O2.
void* f1() {
O1();
var = 0;
}
void* f2() {
O2();
var = 1;
}
int main(int argc, char **argv){
pthread_t t1, t2;
int var;
pthread_create(&t1, NULL, &f1, NULL);
pthread_create(&t2, NULL, &f2, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("var = %d\n", var);
return 0;
}
t1
and t2
each get assigned to different physical cores. The objective of this program is to check which operation was faster by inspecting the value of var
after both the threads have finished executing. This would require that O1() and O2() get run at the exact same time (or with a very slight tolerable difference in the order of few cycles) in parallel on both cores. How can I go about ensuring this?
Edit: Based on Peter Cordes' suggestion, I've modified f1()
and f2()
to read the timestamp for synchronized execution of O1()
and O2()
.
void* f1() {
t1 = rdtsc();
while(t1 != 0){
t1 = rdtsc();
}
printf("t1 = %d\n", t1);
O1();
var = 0;
}
void* f2() {
t2 = rdtsc();
while(t2 != 0){
t2 = rdtsc();
}
printf("t2 = %d\n", t2);
O2();
var = 1;
}
However, t2
gets printed on the console much after t1
does. I guess this suggests that rdtsc
has looped over to 0 in f2()
and doesn't result in a synchronized execution of O1()
and O2()
. Thread barriers didn't offer the granularity of synchronization I require.
Upvotes: 2
Views: 478
Reputation: 8404
The most accurate way of establishing whether O1() or O2() was faster would be a benchmark of each. There are very accurate ways to measure elapsed execution time, and certainly running O1() a few times and then O2() a few times and recording the start/stop times will give an accurate average answer. The more runs are included in the average, the more accurate the result will be, and the more certain one can be of the standard deviation on the result.
Relying on the OS somehow starting up threads instantaneously will not be as good. There is no guarantee that the OS will even run main() after the first thread start; some OSes will let the newly created thread run a while instead of its creating thread, just to see if it completes quickly (which, some do).
Upvotes: 0
Reputation: 50278
f1
and f2
will be certainly called with a small delay in practice on most platforms, but the delay is dependent of the hardware, the operating system (OS) and especially its scheduler. Theoretically, it is not possible to guarantee that the two functions are always started at the same time on all platforms. Indeed, the OS scheduler is free to schedule the threads on the same core and even if you bound threads to core, the thread can be interrupted at any time (eg. by a higher-priority task). Furthermore, core clocks are not strongly synchronized on most modern processors. That being said, a barrier is clearly sufficient in practice to make functions run approximately at the same time (with a granularity close to few microsecond on most systems, possibly even less). Pthread provide such a feature (see pthread_barrier_init
and pthread_barrier_wait
for example). Note that a spin-wait might be needed for a better precision (typically 1-10 ns, possibly a slightly less regarding the hardware). AFAIK it is not possible to synchronize thread with a precision better than several dozens of cycles of x86 processors. This is because modern processors are running instructions in a parallel and out-of-order way with a quite long complex pipeline and any inter-core synchronization is particularly slow (typically because of the long path to take, the cache coherence protocol, and fundamental physics laws).
Upvotes: 3