A. Student
A. Student

Reputation: 57

How can I read an array of structure (OpenCL kernel)

The requirement:

Let say we have 1) Five groups of colors, each group has three colors (the colors are generated dynamically in the CPU) and 2) a list of 1000 car, each car is represented in the list by its color (the color picked from the group). And we want to pass three arguments to an OpenCL kernel: 1) a group of the generated color, 2) a car's color array (1D), and 3) an integer array (1D) to test the car color against the color group (doing a simple calculation).

The structures:

struct GeneratedColorGroup
{
   float4 Color1; //16 =2^4
   float4 Color2; //16 =2^4
   float4 Color3; //16 =2^4
   float4 Color4; //16 =2^4
}

struct ColorGroup
{
    GeneratedColorGroup Colors[8]; //512 = 2^9
}

The kernel code:

__kernel void findCarColorRelation(
const __global ColorGroup *InColorGroups,
const __global float4* InCarColor,
const __global int* CarGroupIndicator
const int carsNumber)
{
    int globalID = get_global_id( 0 );
    if(globalID < carsNumber)
    {
        ColorGroup colorGroups;
        float4 carColor;
        colorGroups = InColorGroups[globalID];
        carColor = InCarColor[globalID];

        for(int groupIndex =0; groupIndex < 8; groupIndex++)
        {
            if(colorGroups[groupIndex].Color1 == carColor)
            {
                CarGroupIndicator[globalID] = groupIndex + 1 ;
                break;
            }

            if(colorGroups[groupIndex].Color2 == carColor)
            {
                CarGroupIndicator[globalID] = groupIndex * 2 + 2;
                break;
            }

            if(colorGroups[groupIndex].Color3 == carColor)
            {
                CarGroupIndicator[globalID] = groupIndex * 3 + 3;
                break;
            }
        }
    }

}

Now, we have 1000 items which mean the kernel is going to be executed 1000 time. That's OK.

The problem: As you see, we have a global ColorGroup as an input to the kernel, this global memory has five items of "GeneratedColorGroup" type.

I tried to access these items as shown in the code above but I got an unexpected result. and the execution is very slow.

What is the wrong with my code? Any help is highly appreciated.

Upvotes: 0

Views: 903

Answers (2)

mogu
mogu

Reputation: 1129

I'm wildly guessing the problem is

... CarGroupIndicator[globalID] = groupIndex + 1 ;
... CarGroupIndicator[globalID] = groupIndex * 2 + 2;
... CarGroupIndicator[globalID] = groupIndex * 3 + 3;

... which makes it impossible to tell from the result CarGroupIndicator[globalID] what was matched exactly. E.g. match on group 5 color 1 results in value 6, but so does group 2 color 2 and also group 1 color 3 result in value 6. What you want is something like this:

... CarGroupIndicator[globalID] = groupIndex;
... CarGroupIndicator[globalID] = groupIndex + 8;
... CarGroupIndicator[globalID] = groupIndex + 16;

.. then 0-7 are color1, 8-15 color2, 16-24 color3.

Upvotes: 0

Andrew Savonichev
Andrew Savonichev

Reputation: 699

When passing structs from a host to a device, make sure you declare the struct type with __attribute__ ((packed)) in both host and device code. Otherwise the host and the device compilers may create have a different memory layout for the struct, i.e. they can use a different size for a padding.

Using packed structs may cause a performance degaradation, because packed structs don't have padding at all, so data within a struct may not be properly aligned and an unaligned access is usually slow. In this case, you have to either manually insert a padding with char[], or use the __attribute__ ((aligned (N))) on a struct field (or on the struct itself).

See the OpenCL C specification for details on packed and aligned attributes: https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/attributes-types.html

Upvotes: 1

Related Questions