clEnqueueNDRangeKernel triggers CL_INVALID_MEM_OBJECT (-38)

I am using the C++ binding for OpenCL and when enqueuing one of my kernels I get a cl::Error which says -38 (CL_INVALID_MEM_OBJECT) for clEnqueueNDRangeKernel.

This error is not listed as one of the possible errors of clEnqueueNDRangeKernel. The notify function gives me following output:

CL_INVALID_MEM_OBJECT error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GTX 560 (Device 0).

I have yet to find a minimal example that exhibits this behaviour.

What can cause this kind of error when calling this function?

Using google I only found this answer yet. It states that I need to resetKernelArg an attached memory object if it has been updated. (At least that is my interpretation of it and there is no detailed explanation of what updated means.) However, I doubt that this is correct, although I cannot prove it. Maybe you know an official source on this?

Update

After some testing I found that adding a __global const float* parameter to the kernel introduced the error. I also found that the error does only occur every time if I clSetKernelArg this new argument after another (already existing) argument. If I do so before the other argument is set, it works every second time. Of course this is not an option as I need to be able to set the argument at any time.

Update 2

I noticed that stepping through the code with a debugging "reintroduces" the error in the version where I set the new argument before the other one. (This means the error occurs again every time.)

Could this be some kind of race condition? I do not use multithreading myself, but in the debugger there are 7 threads which could come from Qt or OpenCL.

Minimal Working Example

#include <CL/cl.hpp>
#include <vector>
#include <iostream>

#define STRINGIFY(x) #x

std::string kernel = STRINGIFY(
__kernel void apply(__global const float *param1)
{
}
);


template <class T>
cl::Buffer genBuffer(const cl::Context &context, const std::vector<T> &data,
                        cl_mem_flags flags = CL_MEM_READ_ONLY)
{
        return cl::Buffer(context, flags | CL_MEM_COPY_HOST_PTR,
                                data.size() * sizeof(data[0]),
                                const_cast<T*>(&data[0]));
}

int main()
{
        std::vector<cl::Platform> clPlatforms;
        cl::Platform::get(&clPlatforms);
        cl_context_properties props[] = {
                CL_CONTEXT_PLATFORM, (cl_context_properties)clPlatforms[0](),
                0};
        cl::Context clContext = cl::Context(CL_DEVICE_TYPE_GPU, props);
        std::vector<cl::Device> devices = clContext.getInfo<CL_CONTEXT_DEVICES>();
        if(devices.empty())
        {
                std::cerr << "No devices found!\n";
                exit(-1);
        }
        cl::Device clDevice = devices[0];
        cl::CommandQueue clQueue = cl::CommandQueue(clContext, clDevice, 0, 0);
        cl::Program program(clContext, cl::Program::Sources(1,
                                std::make_pair(kernel.c_str(), kernel.size())));
        program.build(devices);
        cl::Kernel kernel(program, "apply");

        //this introduces the error
        kernel.setArg(0, genBuffer(clContext, std::vector<cl_float>(100));
        //the error is triggered here
        clQueue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(100), cl::NullRange);
}

Upvotes: 4

Views: 2069

Answers (1)

the problem was that I attached a buffer to a kernel assuming the kernel would retain the buffer. Then I destructed all referencing cl::Buffer/Memory objects which caused the OpenCL implementation to delete the buffer.


After running my program through valgrind I noticed that the opencl.so accessed memory of an object that was freed earlier in a cl::~Buffer subroutine. Reading up on clSetKernelArg I noticed:

Users may not rely on a kernel object to retain objects specified as argument values to the kernel.

The undeterministic behaviour clearly is a result of the driver accessing a freed memory area thereby entering UB land.

Corrected MWE

#include <CL/cl.hpp>
#include <vector>
#include <iostream>

#define STRINGIFY(x) #x

std::string kernel = STRINGIFY(
__kernel void apply(__global const float *param1)
{
}
);


template <class T>
cl::Buffer genBuffer(const cl::Context &context, const std::vector<T> &data,
                        cl_mem_flags flags = CL_MEM_READ_ONLY)
{
        return cl::Buffer(context, flags | CL_MEM_COPY_HOST_PTR,
                                data.size() * sizeof(data[0]),
                                const_cast<T*>(&data[0]));
}

int main()
{
        std::vector<cl::Platform> clPlatforms;
        cl::Platform::get(&clPlatforms);
        cl_context_properties props[] = {
                CL_CONTEXT_PLATFORM, (cl_context_properties)clPlatforms[0](),
                0};
        cl::Context clContext = cl::Context(CL_DEVICE_TYPE_GPU, props);
        std::vector<cl::Device> devices = clContext.getInfo<CL_CONTEXT_DEVICES>();
        if(devices.empty())
        {
                std::cerr << "No devices found!\n";
                exit(-1);
        }
        cl::Device clDevice = devices[0];
        cl::CommandQueue clQueue = cl::CommandQueue(clContext, clDevice, 0, 0);
        cl::Program program(clContext, cl::Program::Sources(1,
                                std::make_pair(kernel.c_str(), kernel.size())));
        program.build(devices);
        cl::Kernel kernel(program, "apply");

        //this version triggers the error
        //kernel.setArg(0, genBuffer(clContext, std::vector<cl_float>(100));

        //This is how it is done correctly
        cl::Buffer buffer = genBuffer(clContext, std::vector<cl_float>(100));
        kernel.setArg(0, buffer);

        clQueue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(100), cl::NullRange);
}

Upvotes: 2

Related Questions