Reputation: 165
I am coding an example program that converts a RGB image to a grayscale one. So, the image is copied from my host to the device as an one-dimension array, called imgIn in my code. As imgIn is a RGB image, each pixel is composed of 3 unsigned char components (R, G and B). As the output (imgOut) is a grayscale image, it is composed of only one channel (luminosity). The code follows:
__kernel void rgbToGray(__global const uchar* restrict imgIn,
__global uchar* restrict imgOut) {
//Get two indexes of the work item
int x = get_global_id(0);
int y = get_global_id(1);
//rgb average is luminosity
//uchar3 channels = *(((__global uchar3 *) imgIn) + (x+640*y));
uchar3 channels = *((__global uchar3 *) (imgIn+3*(x+640*y)));
channels = channels/(uchar3)(3);
imgOut[x+640*y] = channels.s0 + channels.s1 + channels.s2;
}
I would like to understand why the commented declaration of uchar3 channels is not equivalent to the uncommented one. When I shift my uchar pointer to the correct pixel, and then cast it to a uchar3 pointer, the channels variable has the correct value, and my output image is perfect. But when I cast the pointer to a uchar3 one and then I shift pointer to the right pixel (supposedly), my image has a strange pattern, which is depicted on the following line.
https://i.sstatic.net/pjHuR.jpg
Upvotes: 4
Views: 1781
Reputation: 3381
A uchar3
(in fact, any three component vector type) has the same alignment and size as the four component vector of the corresponding type. So a uchar3
is really just a uchar4
with syntactic sugar on top to prevent you from accessing the last component, it's still 4 bytes in size.
So your first line here
uchar3 channels = *(((__global uchar3 *) imgIn) + (x+640*y));
fails because when you do pointer arithmetic with your uchar3 *
, you end up incrementing 4 bytes times (x+640*y)
, whereas you only wanted to increment 3 bytes, so you end up skipping one byte per pixel and that gives you the distorted result you show in your screenshot.
However your second line
uchar3 channels = *((__global uchar3 *) (imgIn+3*(x+640*y)));
works correctly because you are calculating the correct offset manually and then casting the offset pointer to a uchar3 *
, which is fine and gets you the right pixel bytes. However I believe it is still technically undefined if imgIn + 3*(x+640*y)
is not aligned to a 4-byte boundary. If I am wrong (which is very possible) someone can correct me, but otherwise I'd suggest either passing your pixels 4 bytes at a time with an unused padding byte, or unpacking the 3 bytes into a uchar3
manually without going through a pointer reinterpret cast (EDIT: or, rather, use vload3
as prunge suggests, forgot about that one).
My recommendation is to just say no to three component vectors as kernel inputs and outputs. You can use them inside your kernels, but it's just confusing to think of them as literal uchar[3]
types.
Upvotes: 2
Reputation: 23258
According to the section on data types in the spec:
For 3-component vector data types, the size of the data type is 4 * sizeof(component). This means that a 3-component vector data type will be aligned to a 4 * sizeof(component) boundary. The vload3 and vstore3 built-in functions can be used to read and write, respectively, 3-component vector data types from an array of packed scalar data type.
If you need to read 3-component vector values, use vload3. The doco for this explicitly states that it will only read 3 values from memory:
vload3 and vload_half3 read x, y, z components from address (p + (offset * 3)) into a 3-component vector.
So something like this should work:
uchar3 channels = vload3(x + 640 * y, imgIn);
Upvotes: 4