Reputation: 137
I've read as a beginner that using a 2D block of threads is the simplest way to deal with a 2D dataset. I am trying to implement the following matrix operations in sequence:
Swap elements at odd and even positions of each row in the matrix
1 2 2 1
3 4 becomes 4 3
Reflect the elements of the matrix across the principal diagonal
2 1 2 4
4 3 becomes 1 3
To implement this, I wrote the following kernel:
__global__ void swap_and_reflect(float *d_input, float *d_output, int M, int N)
{
int j = threadIdx.x;
int i = threadIdx.y;
for(int t=0;t<M*N;t++)
d_output[t] = d_input[t];
float temp = 0.0;
if (j%2 == 0){
temp = d_output[j];
d_output[j] = d_output[j+1];
d_output[j+1] = temp;
}
__syncthreads(); // Wait for swap to complete
if (i!=j){
temp = d_output[i];
d_output[i] = d_output[j];
d_output[j] = temp;
}
}
The reflection does not happen as expected. But at this point, I am tending to find myself confused with the 2D structure of the executing threads with the 2D structure of the matrix itself.
Could you please correct my understanding of the multi-dimensional arrangement of threads and how it correlates to the dimensionality of the data itself? I believe this is the reason why I have the reflection part of it incorrect.
Any pointers/resources that could help me visualize/understand this correctly would be of immense help.
Thank you for reading.
Upvotes: 0
Views: 50
Reputation: 72342
The thread indices are laid out in your hypothetical 4x4 block in (x,y) pairs as
(0,0) (0,1)
(1,0) (1,1)
and the ordering is
thread ID (x,y) pair
--------- ----------
0 (0,0)
1 (1,0)
2 (0,1)
3 (1,1)
You need to choose an ordering for your array in memory and then modify your kernel accordingly, for example:
if (i!=j){
temp = d_output[i+2*j];
d_output[i+2*j] = d_output[j+2*i];
d_output[j+2*i] = temp;
}
Upvotes: 1