Reputation:
I'm trying to write a simple image filter in OpenCL. The filter should take a 32bpp color image (obtained from a System::Drawing::Bitmap
with LockBits(..)
), convert the pixels to grayscale and apply a 3x3 filter matrix. The resulting image should be able to be displayed as a 8bpp Bitmap
, i.e. with Format8bppIndexed
.
I have a kernel that actually does something, and a singlethread CPU solution which I think should be doing the same. However, the problem is that the resulting images are different: The OpenCL processed image is lighter, its almost white all over, whereas the CPU image looks ok - almost as if it had been converted to grayscale only.
This is the CPU solution:
static float filter[] = { -1.0f, -1.0f, -1.0f, -1.0f, 9.0f, -1.0f, -1.0f, -1.0f, -1.0f };
static float filterNorm = 1.0f;
for (int y = 0; y < height; ++y) {
for (int x = 0; x < width; ++x) {
float gray = 0.0f;
size_t ia = 0;
for (int yi = -1; yi <= 1; ++yi) {
for (int xi = -1; xi <= 1; ++xi) {
int xx = x + xi;
if (xx < 0) xx = 0;
if (xx >= width) xx = width - 1;
int yy = y + yi;
if (yy < 0) yy = 0;
if (yy >= height) yy = height - 1;
size_t idx = 4 * (yy * width + xx);
float r = ((float)inputData32bpp[idx + 0] / 255.0f);
float g = ((float)inputData32bpp[idx + 1] / 255.0f);
float b = ((float)inputData32bpp[idx + 2] / 255.0f);
gray += (filter[ia] * ((r + g + b)/3.0f));
++ia;
}
}
gray /= filterNorm;
if (gray < 0.0f) gray = 0.0f;
if (gray > 1.0f) gray = 1.0f;
size_t idx8 = y * width + x;
outputData8bpp[idx8] = (unsigned char)(gray * 255.0);
}
}
I'm converting to float since I want to achieve a behavior similar to the OpenCL kernel, which also works on floats
due to the image format (CL_UNORM_INT8
). I'm aware that the channel order may be BGR instead of RGB, which however should not matter when converting to grayscale like here.
The OpenCL host code is this:
static cl::ImageFormat formatBGRA(CL_BGRA, CL_UNORM_INT8);
static cl::ImageFormat formatGray(CL_LUMINANCE, CL_UNORM_INT8);
cl_int err = 0;
cl::Image2D inputImage(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
formatBGRA, width, height, 0U, inputData32bpp, &err);
cl::Image2D outputImage(context, CL_MEM_READ_WRITE,
formatGray, width, height, 0U, NULL, &err);
cl::Kernel& imgKernel = kernels[1];
err = imgKernel.setArg(0, inputImage);
err = imgKernel.setArg(1, outputImage);
err = queue.enqueueNDRangeKernel(imgKernel, cl::NDRange(0, 0), cl::NDRange(width, height));
err = queue.enqueueReadImage(outputImage, true, cl::size_t<3>(), getRegion(width, height),
width * sizeof(unsigned char), 0, outputData8bpp);
I'm using the C++ OpenCL API, not the C one. The host code does work; I've been using it with an even simpler kernel succesfully. Now, the kernel here is this:
__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
__kernel void sharpening(__read_only image2d_t inputImg, __write_only image2d_t outputImg) {
int2 coord0 = (int2)(get_global_id(0), get_global_id(1));
float matrix[9] = { -1.0f, -1.0f, -1.0f, -1.0f, 9.0f, -1.0f, -1.0f, 1.0f, 1.0f };
float gray = 0.0f;
int k = 0;
for (int y = -1; y <= 1; ++y) {
for (int x = -1; x <= 1; ++x) {
int2 coord = coord0 + (int2)(x, y);
float4 color = read_imagef(inputImg, sampler, coord);
gray += (matrix[k] * ((color.x + color.y + color.z) / 3.0f));
++k;
}
}
gray = clamp(gray, 0.0f, 1.0f);
write_imagef(outputImg, coord0, (float4)(gray, gray, gray, 1));
}
Why is this not the same as the CPU version? I suppose there is a low-level problem I don't see right now. I've seen that question, which makes me worry I'm having a similar problem here?
In case it's important: I'm running the code on a Surface Pro 2, i.e. Intel HD Graphics.
Sorry for such a lenghty question, hope someone can help!
Upvotes: 1
Views: 1094
Reputation:
Ok, sorry, obviously it was a silly mistake only: The filter matrix in the OpenCL Kernel is wrong, i.e. different from the CPU version, hence the bightness difference.
Concerning the rumors about 32bpp only, or about the integer/float issue: Yes, it is possible to have a 8bpp graycale image in OpenCL. The format must be CL_UNORM_INT8
then, which means the pixels must be read via read_imagef
. The fact that reading pixels in a kernel always returns a vector with four components does not mean the image is always 32bpp. It may appear to be represented as such internally, but can be fed or read to from a 8bpp grayscale image as well - the code in my question proves this.
Upvotes: 1
Reputation: 8410
I think the main problem may be located at outputData8bpp
. Maybe you expect this to be a 8 bits output. But you probably are taking out a 32bits output data structurized in L,L,L,1.0
format.
Upvotes: 0