Generwp
Generwp

Reputation: 514

CUDA error with processing the image

I'm trying to get black and white image as the output with color image as the input. I'm using an OpenCV to get the image and write the output, and CUDA to make the image black and white in kernel. I tried the same code, but without OpenCV, and it worked fine. But now the output is slightly different from what I really expect to get.

I think that CUDA code needs some modification to work with OpenCV. I worked a bit with it but failed to find the way to do that. Maybe somebody can give me an advice or modify my code, please? I'm really confused with this problem.

   __global__ void addMatrix(uchar4 *DataIn, unsigned char *DataOut)
    {
        int idx = blockIdx.x * blockDim.x + threadIdx.x;
        DataOut[idx] = (DataIn[idx].x + DataIn[idx].y + DataIn[idx].z)/3;
    }

int main() 
{
        cudaDeviceProp deviceProp;
        cudaGetDeviceProperties(&deviceProp, 0);

        char* c = "";
        printf("Input source of image\n Example of right directory file: E:\henrik-evensen-castle-valley-v03.jpg\n Your turn:\n");
        char *tbLEN;
        tbLEN = new char [1024];

        cin.getline(tbLEN,1024);

        cout<< endl << "Your image: " << tbLEN << endl;

        //Data for input image
        IplImage* image;
        image = cvLoadImage(tbLEN, 1);
        int height = image->height;
        int width = image->width;
        int step = image->widthStep;
        int SizeIn = (step*height);
        printf("\nProcessing image\n");
        //Data for output image
        IplImage *image2 = cvCreateImage(cvSize(width, height), IPL_DEPTH_8U, 1);
        int step2 = image2->widthStep;
        int SizeOut = step2 * height;

        //GPU
        uchar4* DatIn = (uchar4*)image->imageData;
        unsigned char* DatOut = (unsigned char*)image2->imageData;
        uchar4 *datIndev;
        unsigned char *datOutdev;

        printf("Allocating memory on Device\n");
        /* Allocate memory on Device */
        cudaMalloc(&datIndev, SizeIn * sizeof(unsigned char));
        cudaMalloc(&datOutdev, SizeOut * sizeof(unsigned char));

        printf("Copy data on Device\n");
        /* Copy data on Device */
        cudaMemcpy(datIndev, DatIn, SizeIn * sizeof(unsigned char), cudaMemcpyHostToDevice);
        cudaMemcpy(datOutdev, DatOut, SizeOut * sizeof(unsigned char), cudaMemcpyHostToDevice);

        int NumThreadsX = deviceProp.maxThreadsPerBlock;
        int NumBlocksX = (width * height)/NumThreadsX;

        dim3 blocks(NumBlocksX, 1, 1);
        dim3 threads(NumThreadsX, 1, 1);
        addMatrix <<< blocks, threads >>> (datIndev, datOutdev);


        cudaMemcpy(DatOut, datOutdev, SizeOut * sizeof(unsigned char), cudaMemcpyDeviceToHost);
        cvNamedWindow("Imagecolor");
        cvShowImage("Imagecolor", image);

        cvNamedWindow("Gray");
        cvShowImage("Gray", image2);
        const char* filename1 = "CcPwSwMW4AELPUc.jpg";
        printf("Saving an output image\n");
        cvSaveImage( filename1, image2 );
        cudaFree(datOutdev);
        cudaFree(datIndev);
        cvWaitKey(0);
        return 0;
}

Input

Output

Upvotes: 1

Views: 573

Answers (1)

talonmies
talonmies

Reputation: 72348

There are several problems here:

  1. Your assumption about four channel data is incorrect. Your code will load a three channel BGR image into memory from file. So you need to change references from uchar4 to ucharand then have each thread load three bytes from the source image within your kernel

  2. Your kernel itself contains a potential arithmetic error. The sum of three unsigned char pixel values can overflow an unsigned char intermediate result and produce an incorrect average. You should use a larger type for the calculation.

Taken together, your kernel should look something like this:

__global__ void addMatrix(unsigned char *DataIn, unsigned char *DataOut)
{
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    int b = DataIn[3*idx];
    int g = DataIn[3*idx+1];
    int r = DataIn[3*idx+2];

    DataOut[idx] = (unsigned char)((b + r + g)/3);
}

Then you might find your image looks correct.

Upvotes: 3

Related Questions