Reputation: 60077
By my reading of the standard,
*(_Atomic TYPE*)&(TYPE){0}
(in words, casting a pointer to a non-atomic to a pointer to a corresponding atomic and dereferencing) isn't supported.
Do gcc and/or clang recognize it as an extension if TYPE
is/isn't lock-free? (Question 1)
Second and related question: I was under the impression that if TYPE
couldn't be implemented as a lock free atomic, a lock would need to be embedded in the corresponding _Atomic TYPE
. But if I make TYPE
a largish struct, then on both clang
and gcc
it has the same size as _Atomic TYPE
.
Code for both problems:
#include <stdatomic.h>
#include <stdio.h>
#if STRUCT
typedef struct {
int x;
char bytes[50];
} TYPE;
#else
typedef int TYPE;
#endif
TYPE x;
void f (_Atomic TYPE *X)
{
*X = (TYPE){0};
}
void use_f()
{
f((_Atomic TYPE*)(&x));
}
#include <stdio.h>
int main()
{
printf("%zu %zu\n", sizeof(TYPE), sizeof(_Atomic TYPE));
}
Now, if I compile the above snippet with -DSTRUCT
, both gcc and clang keep the both the struct and its atomic variant at the same size, and they generate a call to a function named __atomic_store
for the store (resolved by linking with -latomic
).
How does this work if no lock is embedded in the _Atomic
version of the struct? (Question 2)
Upvotes: 6
Views: 1600
Reputation: 11
This method is not legal C11, but I managed to fool my compiler (Intel 2019) into casting between atomic and non-atomic "simple" types as follows.
Firstly I had a look inside stdatomic.h on my system (x86_64) to see what the actual definition of the various atomic types really was. So far as I could make out for simple integral types and pointers the atomic type was identical to the normal type, and moreover they were explicitly "lock free".
Next step was to use the sizeof() operator to see how many bytes the atomic types actually used, and again I found that an atomic int was 4 bytes and an atomic pointer was 8 - as I would expect on a 64 bit system.
Explicit casting was banned by the compiler, but this worked:
typedef struct { void *ptr; } IS_NORMAL;
typedef struct { atomic_address ptr; } IS_ATOMIC;
IS_NORMAL a;
IS_ATOMIC *b = (IS_ATOMIC *)&a;
a.ptr = <address>
/* then inspection in the debugger shows that b->ptr is also <address> */
It would happily allow me to cast between those two structure types as shown above, and when I used atomic functions (eg atomic_exchange()) on the IS_ATOMIC pointer variant my debugger showed me that the contents of the non-atomic structure address changed to the expected value.
At which point you might ask "why do this?" The answer is that I have a multi-threaded application where I want to lock a database record for a short period of time so that a single thread can update it without contention from other threads, then release the lock when I am done. Historically I have protected this operation with a critical section but this is very pessimistic since I might have - say - 10,000,000 records and be updating them at random, so the chances of two threads actually trying to update the same record is pretty small, yet a critical section blocks all threads unconditionally. Each record is referred to by a pointer, so the process:
So step (1) locks and step (4) unlocks and, unlike the critical section method, access only has to wait if two threads are trying to access the same address. It seems to work, and on my 6 core system (hyperthreading on, so 12 threads) it is about 5x faster than using a single critical section when working on a real dataset.
So why not define the pointer to the record as atomic in the first place?. The answer is that this particular code may make unthreaded access to that information in other places, and it may also make threaded access in a way that is known to be uncontended; in fact in most situations I don't want to have the locking mechanism because of its cost. Timing tests suggest that a typical atomic lock/unlock operation seems to be taking around 5 to 10 nanoseconds on my system and I want to avoid that overhead when I don't need it, so in those situations I simply use the raw pointer.
I'm offering this as the way that I solved this particular problem. I know that it is not correct C11, I know that it might only work on x86 type architecture - or at least only on architectures where integral and pointer types are lock free and "intrinsically atomic" - and I also accept that there are probably better ways of locking a given address if you know how to write in assembler (which I don't). I'd be delighted to hear of a better solution.
Incidentally I also tried transactional memory (ie _xbegin() .. _xend()) as a way of solving this problem. I found that it worked with small test problems, but once I scaled it up to real data I got frequent _xbegin() failures, and I think this was because when the addresses you are accessing are not in cache memory it tends to bail out, forcing you to take your fallback code path. Intel are not very forthcoming about the details of how it works, so this explanation may be wrong.
I also had a look at Hardware Lock Elision as a way of speeding up the critical section method, but so far as I can see it is deprecated because of vulnerability to hacks .. and anyway I was too thick to understand how to use it!
Upvotes: 1
Reputation: 33719
_Atomic
changes alignment in some corner cases on Clang, and GCC will likely be fixed in the future as well (PR 65146). In these cases, adding _Atomic
through a cast does not work (which is fine from a C standard point of view because it is undefined behavior, as you pointed out).
If the alignment is correct, it is more appropriate to use the __atomic
builtins, which have been designed for exactly this use case:
As described above, this will not work in cases where the ABI provides insufficient alignment for plain (non-atomic) types, and where _Atomic
would change alignment (with Clang only for now).
These builtins also work in case of non-atomic types because they use out-of-line locks. This is also the reason why no additional storage is required for _Atomic
types, which use the same mechanism. This means that there is some unnecessary contention due to unintentional sharing of the locks. How these locks are implemented is an implementation detail which could change in future versions of libatomic
.
In general, for types with atomic builtins that involve locking, using them with shared or aliased memory mappings does not work. These builtins are not async-signal-safe, either. (All these features are technically outside the C standard anyway.)
Upvotes: 4