Reputation: 4401
I implemented bitsets (bitmaps) in C.
Assuming that __builtin_popcountll
is a highly efficient implementation I used that to count bits instead of doing my own implementation.
However when debugging the program it looks as if __builtin_popcountll
is using some loop (and not what I had expected: Assembly instructions).
Actually when profiling my test program with gprof, __popcountdi2
consumed 12% of the total CPU, while the "code around" using it took "0%".
So I wonder: What is the use of such builtin when it's seemingly so inefficient?
Platform is x86_64 (AMD EPYC 7401P) using gcc 4.8.5.
Upvotes: 0
Views: 997
Reputation: 4401
Per https://stackoverflow.com/a/52161813/6607497 the code generated by __builtin_popcountll
depends on the setting of gcc's option -march=
CPU-TYPE:
popcnt
instruction is CPU-TYPE corei7
amdfam10
or barcelona
.native
will also enable it if the current CPU knows the instruction.When set properly (i.e.: the popcnt
instruction is known by the CPU), gcc creates corresponding inline assembly instructions;
otherwise it will call __popcountdi2
(from libgcc2) that counts the bits performing a loop (one iteration per byte).
Upvotes: 1