stranded
stranded

Reputation: 322

CUDA fill smaller arrays based on conditions

Suppose I have an array

X = [1,2,3,4,5,6,7,8,9,10]

Is it possible to create smaller arrays and fill them based on some conditions. For example if I want to separate numbers from X into arrays like

divisibleByTwo = [2,4,6,8,10]
divisibleByThree = [3,6,9]
divisibleByFour = [4,8]

If I have non parallel code, it would be something like

std::vector<int> divisibleByTwo;
for (int i=0; i<sizeof(x); i++)
{
    if (X[i]/2 == 0)
    {
        divisibleByTwo.emplace_back(X[i]);
    }
}

But I cannot do the same thing in CUDA because that would be a race condition

What I really want to do is a comparison between two arrays, and store the indexes in a new array where a condition matches.

For example,

A = [1,2,3]
B = [3,3,2]

and I have to compare all elements of A with B and find indexes of B where the elements are equal. So the result would be an array of arrays such that

C[0] = [ ]  // indexes of B matching element at index 0 of A (1)
c[1] = [2] // indexes of B matching element at index 1 of A (2)
c[2] = [0, 1] // indexes of B matching element at index 2 of A (3)

Upvotes: 0

Views: 119

Answers (1)

KL-Yang
KL-Yang

Reputation: 401

For example divisibleByTwo, you can launch 10 cuda threads, and do something like:

__global__ void decimate(const float *x, float *y) {
   if(threadIdx.x<10 && threadIdx.x%2==0)
      y[threadIdx.x/2] = x[threadIdx.x];
}

In the above example, half of the threads does nothing. Or you can lanch a kernel with 5 threads,

__global__ void decimate(const float *x, float *y) {
   if(threadIdx.x<5)
      y[threadIdx.x] = x[threadIdx.x*2];
}

Upvotes: 1

Related Questions