John
John

Reputation: 2326

cost of atomic<int>::fetch_add verses __sync_fetch_and_add

I was doing some research on g++ 4.4.6 on linux related to atomics. I had a simple loop that I was using to estimate the time it took to do a fetch_add(1) on an atomic.

atomic<int> ia;
ia.store(0);
timespec start,stop;
clock_gettime(CLOCK_REALTIME, &start);
while (ia < THE_MAX)
{
    //++ia;
    ia.fetch_add(1);
}
clock_gettime(CLOCK_REALTIME, &stop);

I was surprised to find that the following ran in about half the time:

volatile int ia=0;
timespec start,stop;
clock_gettime(CLOCK_REALTIME, &start);
while (ia < THE_MAX)
{
    __sync_fetch_and_add( &ia, 1 );
}
clock_gettime(CLOCK_REALTIME, &stop);

I disassembled it - not that I'm very good on x86 assembler - and I see this main difference. The C++11 atomics call generated

call    _ZNVSt9__atomic213__atomic_baseIiE9fetch_addEiSt12memory_order

whereas the gcc atomic gave

lock addl   $1, (%eax)

I would expect g++ to give me the best option, so I'm thinking there's some serious dropout in my understanding of what is going on. Is it clear to anyone out there why the C++ call didn't generate as good as the gcc atomic call? (Maybe it is just an issue of g++ 4.4 not being very mature...). Thanks.

Upvotes: 2

Views: 2256

Answers (1)

David Schwartz
David Schwartz

Reputation: 182819

It's just a matter of GCC version and optimizations. For example, with gcc 4.6.3 and -O3, I get a lock add for atomic<int>::fetch_add.

#include <atomic>
void j(std::atomic<int>& ia)
{
        ia.fetch_add(1);
} 

Yields (for x86_64 with -O3 and gcc-4.6.3):

.LFB382:
    .cfi_startproc
    lock addl       $1, (%rdi)
    ret
    .cfi_endproc

Upvotes: 3

Related Questions