Reputation: 9278

_mm_sfence vs __faststorefence

According to MSDN documentation, __faststorefence is faster than _mm_sfence. In my timings it's over three times slower.

Platform: Win7-64, Visual Studio 2010 with the x64 SDK.

#include <windows.h>
#include <xmmintrin.h>
#include <intrin.h>

int main(int argc, char* argv[])
{
    int* x = new int;
    __int64 loops = 1000000000; // 1 billion
    __int64 start, elapsed;

    start = __rdtsc();
    for (__int64 i = 0; i < loops; i++)
    {
        *x = 0;
        _mm_sfence();
    }
    elapsed = __rdtsc() - start;

    std::cout << "_mm_sfence: " << elapsed << std::endl
              << "average   : " << double(elapsed) / double(loops) << std::endl;

    start = __rdtsc();
    for(__int64 i = 0; i < loops; i++)
    {
        *x = 0;
        __faststorefence();
    }
    elapsed = __rdtsc() - start;

    std::cout << "__faststorefence: " << elapsed << std::endl
              << average          : " << double(elapsed) / double(loops) << std::end;
}

Results:

_mm_sfence average: 5.7
__faststorefence average: 18.37

__faststorefence generates lock or DWORD PTR [rsp], ebp, where ebp has been xor'ed to zero, and _mm_sfence generates sfence (unsurprisingly)

The MSDN docs for __faststorefence explicitly states that it's faster than _mm_sfence so either my test is wrong or they are. Any ideas?

Upvotes: 3

Answers (2)

jpark37

Reputation: 11

The AMD processors I tried with the provided benchmark showed __faststorefence as the winner.

Intel - _mm_sfence: 8.61, __faststorefence: 21.60
AMD 1 - _mm_sfence: 138.21, __faststorefence: 90.96
AMD 2 - _mm_sfence: 55.21, __faststorefence: 20.08

This was with VS 2013.
_mm_sfence = sfence
__faststorefence = lock or dword ptr [rsp],esi

Upvotes: 1

aracntido

Reputation: 253

You cannot compare __fasstorefence (full fence) vs _mm_sfence (store fence).

You need to compare __fasstorefence (full fence) vs _mm_mfence (m - full fence).

Upvotes: 0

_mm_sfence vs __faststorefence

Answers (2)

Related Questions