Reputation: 90
For simplicity to express, I assume the warp size is 8. I have mask 10110110, returned by __ballot function, like above:
int cond = xxxx ? 1 : 0;
mask = __ballot(cond);
Now, I need the relative position in thread collection which thread satisfy the condition.
In the example above, the lane id = {1,2,4,5,7} satisfied the condition. But, how to calculate the relative position with mask. For example, I have a function below:
mask = 10110110
function(mask, 1) -> 0
function(mask, 2) -> 1
function(mask, 4) -> 2
function(mask, 5) -> 3
function(mask, 7) -> 4
How to implement this function by bitwise operation ?
Upvotes: 0
Views: 207
Reputation: 1329
To get the relative position, I would just mask the specific part of your ballot-mask and count these mask bits. Using CUDA's __popc
to count bits, this is as easy as
int function(int mask, int pos)
{
int m = (1 << pos) - 1;
return __popc(mask & m);
}
That way, you calculate the number of set bits from the rightmost bit to the bit at the given pos, which is the relative possition of the set bits as you described it. Notice that this code won't actually count the bit at the given position, but only all set bits before that one.
In case you can't or don't want to use __popc
, you can see implementations of calculating the Hamming Weight for bitoperation-only (and therefore portable) code.
Upvotes: 3