curiousguy
curiousguy

Reputation: 8318

Does the inline asm compiler barrier (memory clobber) count as an external function, or as static function call?

Introduction/confirmation of basic facts

It is well known that with GCC style C and C++ compilers, you can use inline assembly with a "memory" clobber:

asm("":::"memory");

to prevent reordering of (most) code past it, acting as a (thread local) "memory barrier" (for example for the purpose of interacting with async signals).

Note: these "compiler barriers" do NOT accomplish inter-threads synchronization.

It does the equivalent of a call to a non inline function, potentially reading all objects that can be read outside of the current scope and altering all those that can be altered (non const objects):

int i;

void f() {
   int j = i;
   asm("":::"memory"); // can change i
   j += i; // not j *= 2
   // ... (assume j isn't unused)
}

Essentially it's the same as calling a NOP function that's separately compiled, except that the non inline NOP function call is later (1) inlined so nothing survives from it.

(1) say, after compiler middle pass, after analysis

So here j cannot be changed as it's local, and is still the copy of the old i value, but i might have changed, so the compilation is pretty much the same as:

volatile int vi;

int f2() {
   int j = vi;
   ; // can "change" vi
   j += vi; // not j *= 2
   return j;
}

Both reads of vi are needed (for a different reason) so the compiler doesn't change that into 2*vi.

Is my understanding correct up to that point? (I presume it is. Otherwise the question doesn't make sense.)

The real issue: extern or static

The above was just the preamble. The issue I have is with static variables, possible calls to static functions (or the C++ equivalent, anonymous namespaces):

Can a memory clobber access static data that isn't otherwise accessible via non static functions, and call static functions that aren't otherwise callable, as none of these are visible at link stage, from other modules, if they aren't named explicitly in the input arguments of the asm directive?

static int si;

int f3() {
   int j = si;
   asm("":::"memory"); // can access si?
   j += si; // optimized to j = si*2; ?
   return j;
}

[Note: the use of static is a little ambiguous. The suggestion is that the boundary of the TU is important, and that the static variable is TU-private, but I have not described how it was manipulated. Let's assume it is really manipulated that in that TU, or the compiler might assume it's effectively a constant.]

In other words, is that "clobber" the equivalent of a call to:

?

Bonus question: global optimization

If the answer is that static variables aren't treated like extern variables in that case, what is the impact when compiling the program at once? More specifically:

During global compilation of the whole program, with global analysis and inference over variables values, is the knowledge of the fact that for example a global variable is never modified (or never assigned a negative value...), except possibly in an asm "clobber", an input of the optimizer?

In other words, if non static i is only named in one TU, can it be optimized as if it was a static int even if there are asm statements? Should global variables be explicitly listed as clobbers in that case?

Upvotes: 4

Views: 840

Answers (1)

Brendan
Brendan

Reputation: 37242

It does the equivalent of a call to a non inline function, potentially reading all objects that can be read outside of the current scope and altering all those that can be altered (non const objects):

No.

The compiler can decide to inline any function in the same compilation unit (and then, if the function wasn't static, also provide a separate "not inlined" copy for callers in other compilation units so that the linker can find one); and with link-time code optimization/link-time code generation the linker can decide to inline any functions in different compilation units. The only case where it's currently impossible for any function to be inlined is when it is in a shared library; but this limitation currently exists because operating systems currently aren't capable of "load-time optimization".

In other words; any appearance of any kind of barrier for any function is an unintended side-effect of optimizer weaknesses and not guaranteed; and therefore can not/should not be relied on.

The real issue: inline assembly

There are 5 possibilities:

a) The compiler understands all assembly, and is able to examine the inline assembly and determine what is/isn't clobbered; there is no clobber list (and no need for one). In this case (depending on how advanced the compiler/optimiser is) the compiler may be able to determine things like "this area of memory may be clobbered but that area of memory won't be clobbered" and avoid the cost of reloading data from the area of memory that wasn't clobbered.

b) The compiler doesn't understand any assembly and there is no clobber list, so the compiler has to assume everything will be clobbered; which means that the compiler has to generate code that saves the everything (e.g. currently in use values in registers, etc) to memory before the inline assembly is executed and reload everything afterwards, which will give extremely bad performance.

c) The compiler doesn't understand any assembly, and expects the programmer to provide a clobber list to avoid (some of) the performance disaster of having to assume everything will be clobbered.

d) The compiler understands some assembly but not all assembly, and doesn't have a clobber list. If it doesn't understand the assembly it assumes everything may have been clobbered.

e) The compiler understands some assembly but not all assembly, and does have an (optional?) clobber list. If it doesn't understand the assembly it relies on the clobber list (and/or falls back to "assume everything is clobbered" if there is no clobber list), and if it does understand the assembly it ignores the clobber list.

Of course a compiler that uses "option c)" can be improved to use "option e)"; and a compiler that uses "option e)" can be improved to use "option a)".

In other words; any appearance of any kind of barrier for something like "asm("":::"memory");" is an unintended side-effect of the compiler being "improvable"; and therefore can not/should not be relied on.

Summary

None of the things you've mentioned are actually a barrier of any kind. It's all just "unintended and undesired failure to optimize".

If you do need a barrier, then use an actual barrier (e.g. "asm("mfence":::"memory");". However (unless you need inter-threads synchronization and aren't using atomics) its extremely likely that you do not need a barrier in the first place.

Upvotes: 0

Related Questions