Reputation: 2326
I was doing some research on g++ 4.4.6 on linux related to atomics. I had a simple loop that I was using to estimate the time it took to do a fetch_add(1) on an atomic.
atomic<int> ia;
ia.store(0);
timespec start,stop;
clock_gettime(CLOCK_REALTIME, &start);
while (ia < THE_MAX)
{
//++ia;
ia.fetch_add(1);
}
clock_gettime(CLOCK_REALTIME, &stop);
I was surprised to find that the following ran in about half the time:
volatile int ia=0;
timespec start,stop;
clock_gettime(CLOCK_REALTIME, &start);
while (ia < THE_MAX)
{
__sync_fetch_and_add( &ia, 1 );
}
clock_gettime(CLOCK_REALTIME, &stop);
I disassembled it - not that I'm very good on x86 assembler - and I see this main difference. The C++11 atomics call generated
call _ZNVSt9__atomic213__atomic_baseIiE9fetch_addEiSt12memory_order
whereas the gcc atomic gave
lock addl $1, (%eax)
I would expect g++ to give me the best option, so I'm thinking there's some serious dropout in my understanding of what is going on. Is it clear to anyone out there why the C++ call didn't generate as good as the gcc atomic call? (Maybe it is just an issue of g++ 4.4 not being very mature...). Thanks.
Upvotes: 2
Views: 2256
Reputation: 182819
It's just a matter of GCC version and optimizations. For example, with gcc 4.6.3 and -O3, I get a lock add
for atomic<int>::fetch_add
.
#include <atomic>
void j(std::atomic<int>& ia)
{
ia.fetch_add(1);
}
Yields (for x86_64 with -O3 and gcc-4.6.3):
.LFB382:
.cfi_startproc
lock addl $1, (%rdi)
ret
.cfi_endproc
Upvotes: 3