Reputation: 20654
I am having a problem of transposing an image:
I call the kernel method:
// index of the pixel on the image
int index_in = index_x + index_y * width;
int index_out = index_x + index_y*height;
// Allocate the shared memory
__shared__ unsigned int onchip_storage[16][16];
// Load the inputs to the shared memory
onchip_storage[threadIdx.y][threadIdx.x] = in[index_in];
// Save the output value to the memory
out[index_out] = onchip_storage[threadIdx.x][threadIdx.y];
I got the image rotated but somehow the colors are not as original. Any idea?
Thanks in advance.
Upvotes: 0
Views: 900
Reputation: 1004
Can you just use matrix transpose routines, with the "Matrix" being width * height of int3 elements? Those are already optimized really well - in particular the "diagonal" variant in Nvidia's sample code is tons faster than the naive implementation.
Upvotes: 1
Reputation: 213059
Assuming your RGB components are interleaved, then your algorithm is not handling the three components correctly. You really need to make your tile size a multiple of 3 in width, e.g. 18 x 18. Then when you do the transpose you need to transpose elements which are 3 x 4 = 12 bytes wide.
Upvotes: 1