Reputation: 11
I am in some trouble in arranging the threads according to my 2D data array.
It is a compact array where every integer contains 32 bit values [1000110001000000010000000000010]
representing transactions and I need to count the bits row wise(I have used integer instead of bit vector/bitset). Array is of dimension 1000*3125
. Every row contains 1 lakh bit values.
I need to count total bits that are set to 1 for each row ie. for 3125 columns in each row. How should I arrange the threads/ loop for optimum performance?
Upvotes: 1
Views: 1099
Reputation: 151954
You can use a standard parallel reduction approach. You would do one parallel reduction per row of your matrix. The only difference is that each thread will need to pick up a 32-bit value and compute the number of set bits first.
Counting the set bits is easy using the __popc()
intrinsic, which returns the number of bits set in a 32-bit parameter.
For the parallel reduction part, if you're looking for the fastest possible performance use CUB instead of writing your own.
Upvotes: 3