Reputation: 139
I have a piece of old legacy code that does:
if (current_Value > g_max_Value) g_max_Value=current_Value
As you understand with all modern super multi-threading, multi-cpu and huge CPU cache, this code does not work well. Question: How write it reliably, but elegant?
Quick solution is to wrap it in critical section. But if I understand correctly this does not guaranty atomic on CPU level.
Upvotes: 1
Views: 1221
Reputation: 364068
If multiple threads could possibly be updating g_max_Value
at the same time, you need an atomic cmpxchg.
If not, then you don't, even if other threads could be reading it while one thread writes it. You might still need to ensure the stores and loads are atomic, but you don't need an expensive atomic read-modify-write if only one thread is ever writing it at the same time.
If you have any requirements on the order in which updates become visible to other threads, then you also need release / acquire memory ordering or something like that. If not, then "relaxed" memory ordering will ensure that operations are atomic, but won't waste instructions on memory barriers or stop the optimizer reordering at compile time.
ISO C11 already provides atomic compare-exchange as part of the language. Of course, it's an exchange-if-equal because that's what hardware typically provides, so you'll need a loop to retry.
The basic idea is to do the compare for greater-than, then use an atomic cmpxchg for the swap, so the swap only happens if the global hasn't changed (so the compare result is still valid). If it has changed since the compare-for-greater, retry.
#include <stdatomic.h>
#include <stdbool.h>
atomic_int g_max_Value;
// if (current_Value > g_max_Value) g_max_Value=current_Value
bool update_gmaxval(int cur)
{
int tmpg = atomic_load_explicit(&g_max_Value, memory_order_relaxed);
if (cur <= tmpg)
return false;
// global value may change here but still be less than cur, so we need a loop insted of just a single cmpxchg_strong
while (!atomic_compare_exchange_weak_explicit(
&g_max_Value, &tmpg, cur,
memory_order_relaxed, memory_order_relaxed))
{
if (cur <= tmpg)
return false;
}
return true;
}
We could simplify by changing to a do{}while()
loop:
// if (current_Value > g_max_Value) g_max_Value=current_Value
bool update_gmaxval_v2(int cur)
{
int tmpg = atomic_load_explicit(&g_max_Value, memory_order_relaxed);
// global value may change here but still be less than cur, so we need a loop insted of just a single cmpxchg_strong
do {
if (cur <= tmpg)
return false;
} while (!atomic_compare_exchange_weak_explicit(
&g_max_Value, &tmpg, cur,
memory_order_relaxed, memory_order_relaxed));
return true;
}
This compiles to different code, but I'm not sure it's better.
I put the code up on the Godbolt compiler explorer to see if it compiled and look at the asm. Unfortunately Godbolt's ARM/ARM64/PPC compilers are too old (gcc 4.8), and don't support C11 stdatomic, so I could only look at the x86 asm where it doesn't matter that I used memory_order_relaxed
instead of memory_order_seq_cst
(lock
ed instructions are already full memory barriers, and normal loads are implicitly acquire-loads).
I did notice that these wrappers compile to significantly tighter code
void update_gmaxval_void(int cur) { update_gmaxval(cur); }
void update_gmaxval_v2_void(int cur) { update_gmaxval_v2(cur); }
because they don't have to return a value.
Upvotes: 1