Image filter: Results with OpenCL differ from CPU version

Question

I'm trying to write a simple image filter in OpenCL. The filter should take a 32bpp color image (obtained from a System::Drawing::Bitmap with LockBits(..)), convert the pixels to grayscale and apply a 3x3 filter matrix. The resulting image should be able to be displayed as a 8bpp Bitmap, i.e. with Format8bppIndexed.

I have a kernel that actually does something, and a singlethread CPU solution which I think should be doing the same. However, the problem is that the resulting images are different: The OpenCL processed image is lighter, its almost white all over, whereas the CPU image looks ok - almost as if it had been converted to grayscale only.

This is the CPU solution:

static float filter[] = { -1.0f, -1.0f, -1.0f, -1.0f, 9.0f, -1.0f, -1.0f, -1.0f, -1.0f };
static float filterNorm = 1.0f;

for (int y = 0; y < height; ++y) {
    for (int x = 0; x < width; ++x) {
        float gray = 0.0f;

        size_t ia = 0;
        for (int yi = -1; yi <= 1; ++yi) {
            for (int xi = -1; xi <= 1; ++xi) {
                int xx = x + xi;
                if (xx < 0) xx = 0;
                if (xx >= width) xx = width - 1;
                int yy = y + yi;
                if (yy < 0) yy = 0;
                if (yy >= height) yy = height - 1;
                size_t idx = 4 * (yy * width + xx);
                float r = ((float)inputData32bpp[idx + 0] / 255.0f);
                float g = ((float)inputData32bpp[idx + 1] / 255.0f);
                float b = ((float)inputData32bpp[idx + 2] / 255.0f);
                gray += (filter[ia] * ((r + g + b)/3.0f));
                ++ia;
            }
        }
        gray /= filterNorm;

        if (gray < 0.0f) gray = 0.0f;
        if (gray > 1.0f) gray = 1.0f;

        size_t idx8 = y * width + x;
        outputData8bpp[idx8] = (unsigned char)(gray * 255.0);
    }
}

I'm converting to float since I want to achieve a behavior similar to the OpenCL kernel, which also works on floats due to the image format (CL_UNORM_INT8). I'm aware that the channel order may be BGR instead of RGB, which however should not matter when converting to grayscale like here.

The OpenCL host code is this:

static cl::ImageFormat formatBGRA(CL_BGRA, CL_UNORM_INT8);
static cl::ImageFormat formatGray(CL_LUMINANCE, CL_UNORM_INT8);

cl_int err = 0;
cl::Image2D inputImage(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
    formatBGRA, width, height, 0U, inputData32bpp, &err);
cl::Image2D outputImage(context, CL_MEM_READ_WRITE,
    formatGray, width, height, 0U, NULL, &err);

cl::Kernel& imgKernel = kernels[1];
err = imgKernel.setArg(0, inputImage);
err = imgKernel.setArg(1, outputImage);

err = queue.enqueueNDRangeKernel(imgKernel, cl::NDRange(0, 0), cl::NDRange(width, height));

err = queue.enqueueReadImage(outputImage, true, cl::size_t<3>(), getRegion(width, height),
    width * sizeof(unsigned char), 0, outputData8bpp);

I'm using the C++ OpenCL API, not the C one. The host code does work; I've been using it with an even simpler kernel succesfully. Now, the kernel here is this:

__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
    CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;

__kernel void sharpening(__read_only image2d_t inputImg, __write_only image2d_t outputImg) {

    int2 coord0 = (int2)(get_global_id(0), get_global_id(1));
    float matrix[9] = { -1.0f, -1.0f, -1.0f, -1.0f, 9.0f, -1.0f, -1.0f, 1.0f, 1.0f };

    float gray = 0.0f;
    int k = 0;
    for (int y = -1; y <= 1; ++y) {
        for (int x = -1; x <= 1; ++x) {
            int2 coord = coord0 + (int2)(x, y);
            float4 color = read_imagef(inputImg, sampler, coord);
            gray += (matrix[k] * ((color.x + color.y + color.z) /  3.0f));
            ++k;
        }
    }

    gray = clamp(gray, 0.0f, 1.0f);
    write_imagef(outputImg, coord0, (float4)(gray, gray, gray, 1));
}

Why is this not the same as the CPU version? I suppose there is a low-level problem I don't see right now. I've seen that question, which makes me worry I'm having a similar problem here?

In case it's important: I'm running the code on a Surface Pro 2, i.e. Intel HD Graphics.

Sorry for such a lenghty question, hope someone can help!

user2380654 · Accepted Answer

Ok, sorry, obviously it was a silly mistake only: The filter matrix in the OpenCL Kernel is wrong, i.e. different from the CPU version, hence the bightness difference.

Concerning the rumors about 32bpp only, or about the integer/float issue: Yes, it is possible to have a 8bpp graycale image in OpenCL. The format must be CL_UNORM_INT8 then, which means the pixels must be read via read_imagef. The fact that reading pixels in a kernel always returns a vector with four components does not mean the image is always 32bpp. It may appear to be represented as such internally, but can be fed or read to from a 8bpp grayscale image as well - the code in my question proves this.

Image filter: Results with OpenCL differ from CPU version

Answers (2)

Related Questions