Reputation: 141
I was trying to how many 1 in 512MB memory and I found two possible methods, _mm_popcnt_u64()
and __builtin_popcountll()
in the gcc
builtins.
_mm_popcnt_u64()
is said to use the CPU introduction SSE4.2,which seems to be the fastest, and __builtin_popcountll()
is excepted to use table lookup.
So, I think __builtin_popcountll()
should be little slower than _mm_popcnt_u64()
.
However I got a result like this:
It took almost the same time for two methods. I highly doubt that they used the same way to work.
I also got this in popcntintrin.h
/* Calculate a number of bits set to 1. */
extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial___))
_mm_popcnt_u32 (unsigned int __X)
{
return __builtin_popcount (__X);
}
#ifdef __x86_64__
extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_popcnt_u64 (unsigned long long __X)
{
return __builtin_popcountll (__X);
}
#endif
So, I'm confused how __builtin_popcountll()
works on earth
Upvotes: 12
Views: 13843
Reputation: 485
Besides any other consideration you cannot just time a loop and use the raw number for any meaningful benchmark.
As a rule of thumb, if it doesn't have error bars (variance) it is not a proper measure of anything. Next time try running your benchmarks 10 times (or 1000) each, compute the average and standard deviations, and make sure one of the results is better/worse than the other with high statistical confidence, i.e. > 99.9%.
https://en.wikipedia.org/wiki/Standard_deviation#Estimation
And as a side note, a 0.1% difference in a benchmark should usually be considered statistical noise, especially if you are measuring CPU instrinsics or any other function that takes under 100 cycles to execute.
Upvotes: 0
Reputation: 35
If You compile without march flag, so with x86_64 default, builtin should be slower because it needs to dispatch function selecting between different architectures. This will cause no inlining and additional condition.
Upvotes: 1
Reputation:
_mm_popcnt_u64
is part of <nmmintrin.h>
, a header devised by Intel for utility functions for accessing SSE 4.2 instructions.
__builtin_popcountll
is a GCC extension.
_mm_popcnt_u64
is portable to non-GNU compilers, and __builtin_popcountll
is portable to non-SSE-4.2 CPUs. But on systems where both are available, both should compile to the exact same code.
Upvotes: 21