Rajesh Shashi Kumar
Rajesh Shashi Kumar

Reputation: 137

What is the correlation between dimensional nature of threads and the dimensions of the data itself in CUDA?

I've read as a beginner that using a 2D block of threads is the simplest way to deal with a 2D dataset. I am trying to implement the following matrix operations in sequence:

  1. Swap elements at odd and even positions of each row in the matrix

    1 2         2 1
    3 4 becomes 4 3
    
  2. Reflect the elements of the matrix across the principal diagonal

    2 1         2 4
    4 3 becomes 1 3
    

To implement this, I wrote the following kernel:

__global__ void swap_and_reflect(float *d_input, float *d_output, int M, int N)
{
    int j = threadIdx.x;
    int i = threadIdx.y;
    for(int t=0;t<M*N;t++)
      d_output[t] = d_input[t];
    float temp = 0.0;
    if (j%2 == 0){
            temp = d_output[j];
            d_output[j] = d_output[j+1];
            d_output[j+1] = temp;         
    }
    __syncthreads(); // Wait for swap to complete
    if (i!=j){
        temp = d_output[i];
        d_output[i] = d_output[j];
        d_output[j] = temp;       
    }
}

The reflection does not happen as expected. But at this point, I am tending to find myself confused with the 2D structure of the executing threads with the 2D structure of the matrix itself.

Could you please correct my understanding of the multi-dimensional arrangement of threads and how it correlates to the dimensionality of the data itself? I believe this is the reason why I have the reflection part of it incorrect.

Any pointers/resources that could help me visualize/understand this correctly would be of immense help.

Thank you for reading.

Upvotes: 0

Views: 50

Answers (1)

talonmies
talonmies

Reputation: 72342

The thread indices are laid out in your hypothetical 4x4 block in (x,y) pairs as

(0,0)  (0,1)
(1,0)  (1,1)

and the ordering is

thread ID       (x,y) pair
---------       ----------
0               (0,0)
1               (1,0)
2               (0,1)
3               (1,1)

You need to choose an ordering for your array in memory and then modify your kernel accordingly, for example:

if (i!=j){
    temp = d_output[i+2*j];
    d_output[i+2*j] = d_output[j+2*i];
    d_output[j+2*i] = temp;       
}

Upvotes: 1

Related Questions