Brendan McKay
Brendan McKay

Reputation: 138

Portably testing for the POPCNT instruction

I'd like my configure script to detect the availability of the POPCNT instruction across a wide variety of Unix-like systems. At the moment I do these tests:

  1. Look for "popcnt" in /proc/cpuinfo. This works in Linux and Cygwin.
  2. Look for "popcnt" in the output of "sysctl -n machdep.cpu.features". This works in MACOSX and (untested) BSD,
  3. Look for "popcnt" in the output of "isainfo -v -x". This works (untested) in solaris.

The greps are done case-independently. Can you see any problems with these, and do you know of any other tests?

Tests requiring root privilege are no use.

Upvotes: 2

Views: 1758

Answers (2)

Fred
Fred

Reputation: 1

If you are testing for POPCNT before installing Windows 11 and you have GIT BASH installed on your machine, then type the following command in GIT BASH:

cat /proc/cpuinfo | grep -u popcnt | uniq

and if it says:

flags : fpu ... sse4_1 sse4_2 popcnt

then you are good to go.

Generally:

Intel Core i5 and Intel Core i7 have the popcnt instruction

Intel Core 2 Duo does not have it.

Upvotes: 0

Peter Cordes
Peter Cordes

Reputation: 365457

So you have code that enables -mpopcnt and uses __builtin_popcount if that will be fast. Otherwise you use something different, because your custom solution beats gcc's implementation?

Keep in mind that host != target in some cases. Build-time CPU detection is not appropriate for making binaries that have to run on other machines. e.g. Linux distros making binaries. Cross-compiling for is also a thing, and is commonly done when targeting an embedded system or an old slow system.


Maybe write a custom C program that returns the result you want.

On x86, you could just use the result of runtime CPU detection: run the CPUID instruction and check if popcnt is supported. It's probably best not to unconditionally run the popcnt instruction, since processes that run an illegal instruction generate a syslog entry on some modern distros (e.g. Ubuntu).

With recent GNU C extensions, the easiest way to do that is: __builtin_cpu_init() and __builtin_cpu_supports("popcnt"), saving you the trouble of manually decoding the CPUID results.


You could then fall back to a micro-benchmark of a __builtin_popcount against your custom macro, and take whichever is faster. That might be useful even on non-x86 architectures where your macros beat gcc's implementation. (e.g. an architecture that always has a popcnt instruction available). Then you'd have to handle the case where you should use __builtin_popcount but not build with -mpopcnt

Upvotes: 1

Related Questions