Access to each matrix point

Question

I have a problem when I try to get access to each point of matrix in CUDA kernel. I'm working with OpenCV and I'm trying to "do something" on each point of matrix. So, I'm converting uint8_t matrix to float matrix like this:

for(int i=0; i



And this works for 3 channels image cause if I created output image from this matrix (after back conversion to uint8_t) looks same as input.
But I want to make some changes using CUDA:

I set block size and grid size like this:

dim3 dimBlock(count, 3);
dim3 dimGrid( frameHeight/count, frameWidth/count);


Where count is thread number, 3 is channel number, frameHeight and frameWidth are frame size.

So, I allocated GPUexamMatrix and GPUresultMatrix and tried to access to each point in kernel. My kernel looks like this:

resultMatrix[(blockIdx.x * blockIdx.y) + (threadIdx.x * threadIdx.y)] = examMatrix[(blockIdx.x * blockIdx.y) + (threadIdx.x * threadIdx.y)];


So, as you can see I tried to simply copy matrix. After this operation, when I returned my matrix to host and printed it I've got really small or really big float numbers inside matrix, but not the numbers from examine matrix.

I suppose I'm doing something wrong inside kernel. Any ideas?

Robert Crovella · Accepted Answer

Your usage of the built-in variables can't possibly be correct. To give just one simple example, threadIdx.x = 0 and threadIdx.y = 2 will access the same point as threadIdx.x = 2 and threadIdx.y = 0. You don't have unique indexing. I could try and give you something that will work, but I'm a little confused by your dimBlock variable. In short, this is not how I would set up grid/block/indexing to handle a 2D array. I wouldn't use 3, the channel number, in my threadblock dimensioning

Try something like this:

// make sure count is small like 16: count*count<512 or 1024 depending on GPU
dim3 dimBlock(count, count);
dim3 dimGrid( frameWidth/dimBlock.x, frameHeight/dimBlock.y);

and in your kernel:

int row = threadIdx.y + blockIdx.y*blockDim.y;
int col = threadIdx.x + blockIdx.x*blockDim.x;

resultMatrix[3*(row*frameWidth + col) + 0] = examMatrix[3*(row*frameWidth + col) + 0];
resultMatrix[3*(row*frameWidth + col) + 1] = examMatrix[3*(row*frameWidth + col) + 1];
resultMatrix[3*(row*frameWidth + col) + 2] = examMatrix[3*(row*frameWidth + col) + 2];

The above assumes frameWidth and frameHeight are evenly divisible by count

Access to each matrix point

Answers (1)

Related Questions