Skipping every M elements when iterating through an array in CUDA

Question

I am new to Cuda programming and I have been trying to figure out how to convert the following code into Cuda code.

for (int i = 0; i <= N; i += M) 
{
    output[i].x = signal[i].x;
    output[i].y = signal[i].y;
}

following a vector_add example, I was able to get this:

__global__ void dec(const complex * signal, int N, int M, complex * output)
{

    int i = blockIdx.x * blockDim.x + threadIdx.x;

    if (i <= N) 
    {
        output[i].x = signal[i].x;
        output[i].y = signal[i].y;
    }

And this is where I am stuck. In my understanding, all thread/units would calculate in parallel, so I wasn't sure where to inform the iterator to skip every M elements in Cuda. An alternative I thought of was to check i % M == 0. But I'd like to see if there is anything else I should know first to tackle this problem, such as thread syncing and etc.

Any help is appreciated.

Robert Crovella · Accepted Answer

Something like this should work:

__global__ void dec(const complex * signal, int N, int M, complex * output)
{

    int i = blockIdx.x * blockDim.x + threadIdx.x; 
    i *= M;   // add this line
    if (i <= N) 
    {
        output[i].x = signal[i].x;
        output[i].y = signal[i].y;
    }

You should also make sure that you don't overflow the int variable. This should be possible to manage by not launching unnecessary threads, i.e. don't launch a grid of significantly more than N/M threads.

Skipping every M elements when iterating through an array in CUDA

Answers (1)

Related Questions