cost of atomic::fetch_add verses __sync_fetch_and_add

Question

I was doing some research on g++ 4.4.6 on linux related to atomics. I had a simple loop that I was using to estimate the time it took to do a fetch_add(1) on an atomic.

atomic ia;
ia.store(0);
timespec start,stop;
clock_gettime(CLOCK_REALTIME, &start);
while (ia < THE_MAX)
{
    //++ia;
    ia.fetch_add(1);
}
clock_gettime(CLOCK_REALTIME, &stop);

I was surprised to find that the following ran in about half the time:

volatile int ia=0;
timespec start,stop;
clock_gettime(CLOCK_REALTIME, &start);
while (ia < THE_MAX)
{
    __sync_fetch_and_add( &ia, 1 );
}
clock_gettime(CLOCK_REALTIME, &stop);

I disassembled it - not that I'm very good on x86 assembler - and I see this main difference. The C++11 atomics call generated

call    _ZNVSt9__atomic213__atomic_baseIiE9fetch_addEiSt12memory_order

whereas the gcc atomic gave

lock addl   $1, (%eax)

I would expect g++ to give me the best option, so I'm thinking there's some serious dropout in my understanding of what is going on. Is it clear to anyone out there why the C++ call didn't generate as good as the gcc atomic call? (Maybe it is just an issue of g++ 4.4 not being very mature...). Thanks.

David Schwartz · Accepted Answer

It's just a matter of GCC version and optimizations. For example, with gcc 4.6.3 and -O3, I get a lock add for atomic::fetch_add.

#include 
void j(std::atomic& ia)
{
        ia.fetch_add(1);
}

Yields (for x86_64 with -O3 and gcc-4.6.3):

.LFB382:
    .cfi_startproc
    lock addl       $1, (%rdi)
    ret
    .cfi_endproc

cost of atomic<int>::fetch_add verses __sync_fetch_and_add

Answers (1)

Related Questions

cost of atomic&lt;int&gt;::fetch_add verses __sync_fetch_and_add

Answers (1)

Related Questions

cost of atomic<int>::fetch_add verses __sync_fetch_and_add