Count values from array CUDA

Question

I have an array of float values, namely life, of which i want to count the number of entries with a value greater than 0 in CUDA.

On the CPU, the code would look like this:

int numParticles = 0;
for(int i = 0; i < MAX_PARTICLES; i++){
    if(life[i]>0){
        numParticles++;
    }
}

Now in CUDA, I've tried something like this:

__global__ void update(float* life, int* numParticles){
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    if (life[idx]>0){
        (*numParticles)++;
    }
}
//life is a filled device pointer
int launchCount(float* life)
{
    int numParticles = 0;
    int* numParticles_d = 0;
    cudaMalloc((void**)&numParticles_d, sizeof(int));
    update<<>>(life, numParticles_d);
    cudaMemcpy(&numParticles, numParticles_d, sizeof(int), cudaMemcpyDeviceToHost);
    std::cout << "numParticles: " << numParticles << std::endl;
}

But for some reason the CUDA attempt always returns 0 for numParticles. How come?

kangshiyin · Accepted Answer

Your code is actually launching MAX_PARTICLES threads, and multiple thread blocks are executing (*numParticles)++; concurrently. It is a race condition. So you have the result 0, or if you are luck, sometimes a little bigger than 0.

As your attempt to sum up life[i]>0 ? 1 : 0 for all i, you could follow CUDA parallel reduction to implement your kernel, or use Thrust reduction to simplify your life.

Count values from array CUDA

Answers (2)

Related Questions