Fill an array or a list in CUDA kernel but not in every thread

Question

Basically, I have an if() in my kernel and if the condition is verified I would like to store a new value in dynamic list or array. The problem is that I can't use the threadIdx because it will not be filled in every kernel.

Something like :

__global__ void myKernel(customType *c)
{
    int i = threadIdx.x;
    //whatever
    if(condition)
        c->pop(newvalue)
}

In fact I would like to avoid a c[i]=newvalue because at the end I would need to check every c[i] if a value was inserted or not with a for loop in the host code and to fill properly another structure. I thought about thrust but it seems to be an overkill for my "simple" problem.

Hope you can help me find a workaround.

Tom · Accepted Answer

If I understand correctly, your describing a stream compaction. Some, not all threads will create a value and you want to store those values in an array without any gaps.

One way to implement this is using stream compaction algorithms available in Thrust (check out this example). Note that this does require you to perform the operation in two passes.

If you're doing this from within a single thread-block (as opposed to the entire grid) then you could also look at CUB. Each thread would compute a flag indicating if it wants to store a value, do a prefix-sum on the flags to determine each thread's offset in the list, then do the store.

Fill an array or a list in CUDA kernel but not in every thread

Answers (2)

Related Questions