user3711054
user3711054

Reputation: 11

CUDA programming: bitwise count rowwise

I am in some trouble in arranging the threads according to my 2D data array.

It is a compact array where every integer contains 32 bit values [1000110001000000010000000000010] representing transactions and I need to count the bits row wise(I have used integer instead of bit vector/bitset). Array is of dimension 1000*3125. Every row contains 1 lakh bit values.

I need to count total bits that are set to 1 for each row ie. for 3125 columns in each row. How should I arrange the threads/ loop for optimum performance?

Upvotes: 1

Views: 1099

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151954

You can use a standard parallel reduction approach. You would do one parallel reduction per row of your matrix. The only difference is that each thread will need to pick up a 32-bit value and compute the number of set bits first.

Counting the set bits is easy using the __popc() intrinsic, which returns the number of bits set in a 32-bit parameter.

For the parallel reduction part, if you're looking for the fastest possible performance use CUB instead of writing your own.

Upvotes: 3

Related Questions