Reputation:
So, I am trying to perform some operation inside an OpenCL kernel. I have this buffer named filter which is a 3x3 matrix initialized with value 1.
I pass this as an argument to the OpenCL kernel from the host side. The issue is when I try to fetch this buffer on the device side as a float3 vector. For ex -
__kernel void(constant float3* restrict filter)
{
float3 temp1 = filter[0];
float3 temp2 = filter[1];
float3 temp3 = filter[2];
}
The first two temp variables behave as expected and have all their value as 1. But, the third temp variable (temp3) has only the x component as 1 and rest of the y and z components are 0. When I fetch the buffer as only a float vector, everything behaves as expected. Am I doing something wrong? I don't want to use vload instructions as they give an overhead.
Upvotes: 1
Views: 77
Reputation: 1289
In OpenCL, float3
is just an alias for float4
, so your 9 values will fill the x
, y
, z
, and w
component of temp1
and temp2
, which leaves just one value for temp3.x
. You will probably need to use the vload3 instruction.
See section 6.1.5. Alignment of Types of the OpenCL specification for more information:
For 3-component vector data types, the size of the data type is
4 * sizeof(component)
. This means that a 3-component vector data type will be aligned to a4 * sizeof(component)
boundary. The vload3 and vstore3 built-in functions can be used to read and write, respectively, 3-component vector data types from an array of packed scalar data type.
Upvotes: 2