Reputation: 138
I'd like my configure script to detect the availability of the POPCNT instruction across a wide variety of Unix-like systems. At the moment I do these tests:
The greps are done case-independently. Can you see any problems with these, and do you know of any other tests?
Tests requiring root privilege are no use.
Upvotes: 2
Views: 1758
Reputation: 1
If you are testing for POPCNT before installing Windows 11 and you have GIT BASH installed on your machine, then type the following command in GIT BASH:
cat /proc/cpuinfo | grep -u popcnt | uniq
and if it says:
flags : fpu ... sse4_1 sse4_2 popcnt
then you are good to go.
Generally:
Intel Core i5 and Intel Core i7 have the popcnt
instruction
Intel Core 2 Duo does not have it.
Upvotes: 0
Reputation: 365457
So you have code that enables -mpopcnt
and uses __builtin_popcount
if that will be fast. Otherwise you use something different, because your custom solution beats gcc's implementation?
Keep in mind that host != target in some cases. Build-time CPU detection is not appropriate for making binaries that have to run on other machines. e.g. Linux distros making binaries. Cross-compiling for is also a thing, and is commonly done when targeting an embedded system or an old slow system.
Maybe write a custom C program that returns the result you want.
On x86, you could just use the result of runtime CPU detection: run the CPUID
instruction and check if popcnt is supported. It's probably best not to unconditionally run the popcnt
instruction, since processes that run an illegal instruction generate a syslog entry on some modern distros (e.g. Ubuntu).
With recent GNU C extensions, the easiest way to do that is: __builtin_cpu_init()
and __builtin_cpu_supports("popcnt")
, saving you the trouble of manually decoding the CPUID results.
You could then fall back to a micro-benchmark of a __builtin_popcount
against your custom macro, and take whichever is faster. That might be useful even on non-x86 architectures where your macros beat gcc's implementation. (e.g. an architecture that always has a popcnt instruction available). Then you'd have to handle the case where you should use __builtin_popcount
but not build with -mpopcnt
Upvotes: 1