What is the correlation between dimensional nature of threads and the dimensions of the data itself in CUDA?

Question

I've read as a beginner that using a 2D block of threads is the simplest way to deal with a 2D dataset. I am trying to implement the following matrix operations in sequence:

Swap elements at odd and even positions of each row in the matrix
```
1 2         2 1
3 4 becomes 4 3
```
Reflect the elements of the matrix across the principal diagonal
```
2 1         2 4
4 3 becomes 1 3
```

To implement this, I wrote the following kernel:

__global__ void swap_and_reflect(float *d_input, float *d_output, int M, int N)
{
    int j = threadIdx.x;
    int i = threadIdx.y;
    for(int t=0;t



The reflection does not happen as expected. But at this point, I am tending to find myself confused with the 2D structure of the executing threads with the 2D structure of the matrix itself. 

Could you please correct my understanding of the multi-dimensional arrangement of threads and how it correlates to the dimensionality of the data itself? I believe this is the reason why I have the reflection part of it incorrect.

Any pointers/resources that could help me visualize/understand this correctly would be of immense help. 

Thank you for reading.

talonmies · Accepted Answer

The thread indices are laid out in your hypothetical 4x4 block in (x,y) pairs as

(0,0)  (0,1)
(1,0)  (1,1)

and the ordering is

thread ID       (x,y) pair
---------       ----------
0               (0,0)
1               (1,0)
2               (0,1)
3               (1,1)

You need to choose an ordering for your array in memory and then modify your kernel accordingly, for example:

if (i!=j){
    temp = d_output[i+2*j];
    d_output[i+2*j] = d_output[j+2*i];
    d_output[j+2*i] = temp;       
}

What is the correlation between dimensional nature of threads and the dimensions of the data itself in CUDA?

Answers (1)

Related Questions