U. Windl
U. Windl

Reputation: 4401

What's the sense of gcc's `__builtin_popcountll`?

I implemented bitsets (bitmaps) in C. Assuming that __builtin_popcountll is a highly efficient implementation I used that to count bits instead of doing my own implementation.

However when debugging the program it looks as if __builtin_popcountll is using some loop (and not what I had expected: Assembly instructions). Actually when profiling my test program with gprof, __popcountdi2 consumed 12% of the total CPU, while the "code around" using it took "0%".

So I wonder: What is the use of such builtin when it's seemingly so inefficient?

Platform is x86_64 (AMD EPYC 7401P) using gcc 4.8.5.

Upvotes: 0

Views: 997

Answers (1)

U. Windl
U. Windl

Reputation: 4401

Per https://stackoverflow.com/a/52161813/6607497 the code generated by __builtin_popcountll depends on the setting of gcc's option -march=CPU-TYPE:

  • For Intel CPUs the first to support the SSE4a popcnt instruction is CPU-TYPE corei7
  • For AMD CPUs it's CPU-TYPE amdfam10 or barcelona.
  • Using CPU-TYPE native will also enable it if the current CPU knows the instruction.

When set properly (i.e.: the popcnt instruction is known by the CPU), gcc creates corresponding inline assembly instructions; otherwise it will call __popcountdi2 (from libgcc2) that counts the bits performing a loop (one iteration per byte).

Upvotes: 1

Related Questions