Reputation: 927
I have a vector of structure A in the device memory of GPU.
struct A{
int a;
int type;
}
I am trying to split each type and store it in an array.
For example if the types are 0,1 and 2,then split all 0's,1's and 2's
in a separate array.
The approach I am thinking is to implement it using switch case in parallel but it might lead to a lot of divergence and thus may not be an efficient.
Is there any other approach that would help ?
Upvotes: 0
Views: 282
Reputation: 2250
I would start by checking out thrust:
http://docs.nvidia.com/cuda/thrust/
how fast is thrust::sort and what is the fastest radix sort implementation
There is a way to sort with key/value pairs in case you need something else to be sorted along with it:
Thrust Sort by key on the fly or different approach?
I have implemented my own cuda radix sort. I needed to keep track of an index number while I sorted on something specific. If these options don't work for you I can try to find some time to document it.
Upvotes: 1