How to write in global memory from different threads CUDA

Question

I have a kernel that is searching in different arrays (one thread per array), I need that always that one thread find a match, the result will be written in a global memory array. The problem is that how can access to this global array without write in the same location twice or left a location empty?

This is an pseudo code example of what Im trying to do:

__global__ void find(*TableOfArrays, *Result, position)
{
   int idx = blockIdx.x * blockDim.x + threadIdx.x;
   if (idx < numOfArrays)
   {
     for (int i = 0; i < tableOfArrays[idx].lenght; i++)
     {
        if (Match(tableOfArrays[idx][i]))
        {
            //The position variable gives me the position of the global array.  
            atomicAdd(&(position), (int)1);
            //I want to write each result in one space of the array Result
            Result[position] = tableOfArrays[idx][i];
        }   
     }
   }
}

The problem is that the threads are no accesing in an order to the Result array, and some threads take the same space... Any help?? thak you.

Melvon · Accepted Answer

You must take the value of the variable when atomicAdd read the memory, after atomicAdd is executed another thread can access the memory and modify it.

int localIndex = atomicAdd(&(position), (int)1);
Result[localIndex] = tableOfArrays[idx][i];

How to write in global memory from different threads CUDA

Answers (1)

Related Questions