Reputation: 314
I want to return both selected and not selected part of an array using CUB::DeviceSelect
. The only workaround come to me is by calling CUB::DeviceSelect
again with the opposite SelectOp
, but I wonder if there is more effective way to do this.
A quick demo:
struct Positive {
__host__ __device__ __forceinline__
bool operator()(const int a) const {
return (a > 0);
}
}
int len = 6;
int *d_in = {0, 2, -2, -1, 1, 3};
int *d_out;
int *d_num_select;
void *temp = NULL;
size_t temp_size = 0;
// I want to get something like
// d_out = {2, 1, 3, 0, -2, -1} and *d_num_select = 3
// but calling the code below only give me d_out = {2, 1, 3}
cub::DeviceSelect::If(temp, temp_size, d_in, d_out, d_num_select, len, Positive());
cudaMalloc(&temp, temp_size);
cub::DeviceSelect::If(temp, temp_size, d_in, d_out, d_num_select, len, Positive());
Upvotes: 1
Views: 97
Reputation: 3095
The algorithm that you are describing is called a (stable) partition in the C++ STL/Thrust/CUB.
At the moment CUB does not provide a stable partition but the only thing keeping its partition from being stable is the elements of the second grouping (predicate returns false
) being returned in reversed order.
Thrust's CUDA backend uses the CUB implementation and the stable version currently just reverses the order of the second grouping in a second launch/pass.
Using thrust::stable_partition
might be sub-optimal depending on the context. Ideally it should be possible to read-in the values from the second grouping in reversed order in whatever kernel is making use of the second grouping of the partitioned data (i.e. perform kernel fusion).
For modularity one can wrap the iterators for the second grouping in thrust::reverse_iterator
s, assuming that the kernel using them is properly templated to use fancy iterators for input, as most of CUB's device algorithms are.
Upvotes: 1