olidev
olidev

Reputation: 20654

transpose an image in cuda

I am having a problem of transposing an image:

I call the kernel method:

    // index of the pixel on the image
    int index_in  = index_x + index_y * width;

    int index_out = index_x + index_y*height;   

    // Allocate the shared memory
    __shared__ unsigned int onchip_storage[16][16];

    // Load the inputs to the shared memory
    onchip_storage[threadIdx.y][threadIdx.x] =  in[index_in];            

    // Save the output value to the memory  
    out[index_out] = onchip_storage[threadIdx.x][threadIdx.y];

I got the image rotated but somehow the colors are not as original. Any idea?

Thanks in advance.

Upvotes: 0

Views: 900

Answers (2)

Devin Lane
Devin Lane

Reputation: 1004

Can you just use matrix transpose routines, with the "Matrix" being width * height of int3 elements? Those are already optimized really well - in particular the "diagonal" variant in Nvidia's sample code is tons faster than the naive implementation.

Upvotes: 1

Paul R
Paul R

Reputation: 213059

Assuming your RGB components are interleaved, then your algorithm is not handling the three components correctly. You really need to make your tile size a multiple of 3 in width, e.g. 18 x 18. Then when you do the transpose you need to transpose elements which are 3 x 4 = 12 bytes wide.

Upvotes: 1

Related Questions