Reputation: 9278
According to MSDN documentation, __faststorefence
is faster than _mm_sfence
. In my timings it's over three times slower.
Platform: Win7-64, Visual Studio 2010 with the x64 SDK.
#include <windows.h>
#include <xmmintrin.h>
#include <intrin.h>
int main(int argc, char* argv[])
{
int* x = new int;
__int64 loops = 1000000000; // 1 billion
__int64 start, elapsed;
start = __rdtsc();
for (__int64 i = 0; i < loops; i++)
{
*x = 0;
_mm_sfence();
}
elapsed = __rdtsc() - start;
std::cout << "_mm_sfence: " << elapsed << std::endl
<< "average : " << double(elapsed) / double(loops) << std::endl;
start = __rdtsc();
for(__int64 i = 0; i < loops; i++)
{
*x = 0;
__faststorefence();
}
elapsed = __rdtsc() - start;
std::cout << "__faststorefence: " << elapsed << std::endl
<< average : " << double(elapsed) / double(loops) << std::end;
}
Results:
__faststorefence generates lock or DWORD PTR [rsp], ebp
, where ebp has been xor'ed to zero, and _mm_sfence generates sfence
(unsurprisingly)
The MSDN docs for __faststorefence explicitly states that it's faster than _mm_sfence
so either my test is wrong or they are. Any ideas?
Upvotes: 3
Views: 1685
Reputation: 11
The AMD
processors I tried with the provided benchmark showed __faststorefence as the winner.
Intel - _mm_sfence: 8.61, __faststorefence: 21.60
AMD 1 - _mm_sfence: 138.21, __faststorefence: 90.96
AMD 2 - _mm_sfence: 55.21, __faststorefence: 20.08
This was with VS 2013.
_mm_sfence = sfence
__faststorefence = lock or dword ptr [rsp],esi
Upvotes: 1
Reputation: 253
You cannot compare __fasstorefence (full fence) vs _mm_sfence (store fence).
You need to compare __fasstorefence (full fence) vs _mm_mfence (m - full fence).
Upvotes: 0