caro
caro

Reputation: 411

Access to each matrix point

I have a problem when I try to get access to each point of matrix in CUDA kernel. I'm working with OpenCV and I'm trying to "do something" on each point of matrix. So, I'm converting uint8_t matrix to float matrix like this:

for(int i=0; i<inputMatrix.rows; ++i){
    for(int j=0; j<inputMatrix.cols * cn; j+=cn){                           
        examMatrix[i*inputMatrix.cols*cn + j + 0] = pixelPtr[i*inputMatrix.cols*cn + j + 0]; // B
        examMatrix[i*inputMatrix.cols*cn + j + 1] = pixelPtr[i*inputMatrix.cols*cn + j + 1]; // G
        examMatrix[i*inputMatrix.cols*cn + j + 2] = pixelPtr[i*inputMatrix.cols*cn + j + 2]; // R               
    }
}

And this works for 3 channels image cause if I created output image from this matrix (after back conversion to uint8_t) looks same as input. But I want to make some changes using CUDA:

I set block size and grid size like this:

dim3 dimBlock(count, 3);
dim3 dimGrid( frameHeight/count, frameWidth/count);

Where count is thread number, 3 is channel number, frameHeight and frameWidth are frame size.

So, I allocated GPUexamMatrix and GPUresultMatrix and tried to access to each point in kernel. My kernel looks like this:

resultMatrix[(blockIdx.x * blockIdx.y) + (threadIdx.x * threadIdx.y)] = examMatrix[(blockIdx.x * blockIdx.y) + (threadIdx.x * threadIdx.y)];

So, as you can see I tried to simply copy matrix. After this operation, when I returned my matrix to host and printed it I've got really small or really big float numbers inside matrix, but not the numbers from examine matrix.

I suppose I'm doing something wrong inside kernel. Any ideas?

Upvotes: 0

Views: 153

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

Your usage of the built-in variables can't possibly be correct. To give just one simple example, threadIdx.x = 0 and threadIdx.y = 2 will access the same point as threadIdx.x = 2 and threadIdx.y = 0. You don't have unique indexing. I could try and give you something that will work, but I'm a little confused by your dimBlock variable. In short, this is not how I would set up grid/block/indexing to handle a 2D array. I wouldn't use 3, the channel number, in my threadblock dimensioning

Try something like this:

// make sure count is small like 16: count*count<512 or 1024 depending on GPU
dim3 dimBlock(count, count);
dim3 dimGrid( frameWidth/dimBlock.x, frameHeight/dimBlock.y);

and in your kernel:

int row = threadIdx.y + blockIdx.y*blockDim.y;
int col = threadIdx.x + blockIdx.x*blockDim.x;

resultMatrix[3*(row*frameWidth + col) + 0] = examMatrix[3*(row*frameWidth + col) + 0];
resultMatrix[3*(row*frameWidth + col) + 1] = examMatrix[3*(row*frameWidth + col) + 1];
resultMatrix[3*(row*frameWidth + col) + 2] = examMatrix[3*(row*frameWidth + col) + 2];

The above assumes frameWidth and frameHeight are evenly divisible by count

Upvotes: 1

Related Questions