Read optimizations on shared memory

Question

Suppose you have a function that make several read access to a shared variable whose access is atomic. All in running in the same process. Imagine them as threads of a process or as a sw running on bare metal platform with no MMU.

As a requirement you must ensure that the value of that read is consistent for all the length of the function so the code must not re-read the memory location and have to put in a local variable or on a register. How can we ensure that this behaviour is respected?

As an example...

shared is the only shared variable

extern uint32_t a, b, shared;

void useless_function()
{
  __ASM volatile ("":::"memory");
  uint32_t value = shared;
  a = value *2;
  b = value << 3;
}

Can value be optimized out by direct readings of shared variable in some contexts? If yes, how can I be sure this cannot happen?

Maxim Egorushkin · Accepted Answer

As a requirement you must ensure that the value of that read is consistent for all the length of the function so the code must not re-read the memory location and have to put in a local variable or on a register. How can we ensure that this behaviour is respected?

You can do that with READ_ONCE macro from Linux kernel:

/*
 * Prevent the compiler from merging or refetching reads or writes. The
 * compiler is also forbidden from reordering successive instances of
 * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
 * particular ordering. One way to make the compiler aware of ordering is to
 * put the two invocations of READ_ONCE or WRITE_ONCE in different C
 * statements.
 *
 * These two macros will also work on aggregate data types like structs or
 * unions. If the size of the accessed data type exceeds the word size of
 * the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will
 * fall back to memcpy(). There's at least two memcpy()s: one for the
 * __builtin_memcpy() and then one for the macro doing the copy of variable
 * - '__u' allocated on the stack.
 *
 * Their two major use cases are: (1) Mediating communication between
 * process-level code and irq/NMI handlers, all running on the same CPU,
 * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
 * mutilate accesses that either do not require ordering or that interact
 * with an explicit memory barrier or atomic instruction that provides the
 * required ordering.
 */

E.g.:

uint32_t value = READ_ONCE(shared);

READ_ONCE macro essentially casts the object you read to be volatile because the compiler cannot emit extra reads or writes for volatile objects.

The above is equivalent to:

uint32_t value = *(uint32_t volatile*)&shared;

Alternatively:

uint32_t value;
memcpy(&value, &shared, sizeof value);

memcpy breaks the dependency between shared and value, so that the compiler cannot re-load shared instead of loading value.

Read optimizations on shared memory

Answers (2)

Related Questions