Reputation: 10153
I am interested in opencl implementation of kernel which counts set(1) bits in unsigned int I know that there is such extension for opencl and I don`t want to use it ,but implement by myself
Upvotes: 2
Views: 1133
Reputation:
This is not the exact function you are looking for. But since no one has posted OpenCL code, I will add it. It is OpenCL bit count code for 256 bit integers rather than for 32 bit as you requested. The code is from here. It uses one of the well known algorithms pointed out by Dithermaster. It should not be difficult to convert to 32-bit.
//
// popcnt256 - return population count for 256-bit value
//
uint popcnt256 (ulong4 vreg)
{
const ulong4 m1 = (ulong4)(0x5555555555555555,0x5555555555555555,0x5555555555555555,0x5555555555555555);
const ulong4 m2 = (ulong4)(0x3333333333333333,0x3333333333333333,0x3333333333333333,0x3333333333333333);
const ulong4 m4 = (ulong4)(0x0f0f0f0f0f0f0f0f,0x0f0f0f0f0f0f0f0f,0x0f0f0f0f0f0f0f0f,0x0f0f0f0f0f0f0f0f);
const ulong4 h01 = (ulong4)(0x0101010101010101,0x0101010101010101,0x0101010101010101,0x0101010101010101);
vreg -= (vreg >> 1) & m1;
vreg = (vreg & m2) + ((vreg >> 2) & m2);
vreg = (vreg + (vreg >> 4)) & m4;
vreg = (vreg * h01) >> 56;
return vreg.s0 + vreg.s1 + vreg.s2 + vreg.s3;
}
Upvotes: 2
Reputation: 6343
Most of the old-school CPU-based tricks apply here as well, although any that loop will not be great on the GPU. Probably a (non-global memory) table is going to work best.
See:
How to count the number of set bits in a 32-bit integer?
http://www.geeksforgeeks.org/count-set-bits-in-an-integer/
http://gurmeet.net/puzzles/fast-bit-counting-routines/
Upvotes: 0