How to pass a C struct with arrays and variables to OpenCL kernel using PyOpenCL

So, I have to pass some data to a OpenCL kernel using PyOpenCL or some workaround using Python. The data is readed in the kernel-side as a struct and I can't change the kernel cuz it is working fine and is a part of a much bigger project that my code must work with.

The kernel looks like that:

typedef struct VglClStrEl{ 
    float data[VGL_ARR_CLSTREL_SIZE];
    int ndim;
    int shape[VGL_ARR_SHAPE_SIZE];
    int offset[VGL_ARR_SHAPE_SIZE];
    int size;
} VglClStrEl;

typedef struct VglClShape{ 
    int ndim;
    int shape[VGL_ARR_SHAPE_SIZE];
    int offset[VGL_ARR_SHAPE_SIZE];
    int size;
} VglClShape;

__kernel void kernel(__global unsigned char* img_input, 
                     __global unsigned char* img_output,  
                     __constant VglClShape* img_shape,
                     __constant VglClStrEl* window)
{

    // do what is needed

}

So, as you can see, the VglClShape and VglClStrEl structures, have different type arrays and static-bitsize variables.

The [1] workaround supports structs with only one type arrays(or I tragically failed to get a way to do it with multiple array types).

The [2] workaround is the PyOpenCL documentation reference for how pass Python data to a OpenCL kernel struct. This approach don't support arrays at all.

So, how can I pass the python data as the OpenCL kernel can read? I already have all the data on Python-side, and I just need to know how to pass it from the Python to the kernel.

Before you ask: I am using Python 3 and I CAN NOT CHANGE THE KERNEL.

And yes, the array sizes are static. You can assume something like that:

VGL_ARR_CLSTREL_SIZE=256;
VGL_ARR_SHAPE_SIZE=20;

[1] Passing struct with pointer members to OpenCL kernel using PyOpenCL

[2] https://documen.tician.de/pyopencl/howto.html#how-to-use-struct-types-with-pyopencl

Upvotes: 0

Views: 574

Answers (1)

Colin Stark
Colin Stark

Reputation: 300

There is a hackish way to do this that requires some tedious byte wrangling. Presumably you are OK with deploying a small OpenCL probing kernel? (PyOpenCL does this under the hood for some ops in any case)

The basic idea is to:

  • find out how the OpenCL device aligns all the elements of your structs by running a single instance kernel
  • create a numpy byte array to match the size of the OpenCL struct
  • byte-wise copy each element of your Python struct into this array
  • when invoking your unchangeable OpenCL kernel, pass this array via a bag of bytes buffer

The following kernel does the job:

__kernel void get_struct_sizes( __global uint *struct_sizes )
{
    const uint global_id = get_global_id(0u)+get_global_id(1u)*get_global_size(0u);
    VglClStrEl vgclstrel;
    VglClShape vgclshape;
    uint offset;

    printf("In GPU (probing):\n Kernel instance = %d\n", global_id);

    if (global_id==0) {
        offset = (uint)&(vgclstrel.data);
        struct_sizes[0] = (uint)sizeof(vgclstrel);
        struct_sizes[1] = (uint)&(vgclstrel.ndim)-offset;
        struct_sizes[2] = (uint)&(vgclstrel.shape)-offset;
        struct_sizes[3] = (uint)&(vgclstrel.offset)-offset;
        struct_sizes[4] = (uint)&(vgclstrel.size)-offset;
        offset = (uint)&(vgclshape.ndim);
        struct_sizes[5] = (uint)sizeof(vgclshape);
        struct_sizes[6] = (uint)&(vgclshape.shape)-offset;
        struct_sizes[7] = (uint)&(vgclshape.offset)-offset;
        struct_sizes[8] = (uint)&(vgclshape.size)-offset;
    }
    return;
}

Execute this kernel and return struct_sizes into vgclshape_sizes, create this array:

img_shape  = np.zeros((vgclshape_sizes[0]), dtype=np.uint8)

and copy into it what you need:

def copy_into_byte_array(value, byte_array, offset):
        for i,b in enumerate(np.ndarray.tobytes(value)):
            byte_array[i+offset] = b
copy_into_byte_array(ndim,   img_shape, 0) 
copy_into_byte_array(shape,  img_shape, vgclshape_sizes[1]) 
copy_into_byte_array(offset, img_shape, vgclshape_sizes[2]) 
copy_into_byte_array(size,   img_shape, vgclshape_sizes[3]) 

I've skipped some steps here; filling them in you'll find this approach works. I was able to pass a demo struct to a dummy copy of your inviolate kernel.

I would be interested to hear if there are more elegant ways to do any/all of these steps. I would also expect there will be problems with endianness etc that would otherwise be transparent. With luck you can work around them.

Upvotes: 1

Related Questions