Reputation: 9547
I am using the C++ binding for OpenCL and when enqueuing one of my kernels I get a cl::Error
which says -38 (CL_INVALID_MEM_OBJECT)
for clEnqueueNDRangeKernel
.
This error is not listed as one of the possible errors of clEnqueueNDRangeKernel. The notify function gives me following output:
CL_INVALID_MEM_OBJECT error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GTX 560 (Device 0).
I have yet to find a minimal example that exhibits this behaviour.
What can cause this kind of error when calling this function?
Using google I only found this answer yet. It states that I need to resetKernelArg
an attached memory object if it has been updated. (At least that is my interpretation of it and there is no detailed explanation of what updated means.)
However, I doubt that this is correct, although I cannot prove it. Maybe you know an official source on this?
After some testing I found that adding a __global const float*
parameter to the kernel introduced the error. I also found that the error does only occur every time if I clSetKernelArg
this new argument after another (already existing) argument. If I do so before the other argument is set, it works every second time. Of course this is not an option as I need to be able to set the argument at any time.
I noticed that stepping through the code with a debugging "reintroduces" the error in the version where I set the new argument before the other one. (This means the error occurs again every time.)
Could this be some kind of race condition? I do not use multithreading myself, but in the debugger there are 7 threads which could come from Qt or OpenCL.
#include <CL/cl.hpp>
#include <vector>
#include <iostream>
#define STRINGIFY(x) #x
std::string kernel = STRINGIFY(
__kernel void apply(__global const float *param1)
{
}
);
template <class T>
cl::Buffer genBuffer(const cl::Context &context, const std::vector<T> &data,
cl_mem_flags flags = CL_MEM_READ_ONLY)
{
return cl::Buffer(context, flags | CL_MEM_COPY_HOST_PTR,
data.size() * sizeof(data[0]),
const_cast<T*>(&data[0]));
}
int main()
{
std::vector<cl::Platform> clPlatforms;
cl::Platform::get(&clPlatforms);
cl_context_properties props[] = {
CL_CONTEXT_PLATFORM, (cl_context_properties)clPlatforms[0](),
0};
cl::Context clContext = cl::Context(CL_DEVICE_TYPE_GPU, props);
std::vector<cl::Device> devices = clContext.getInfo<CL_CONTEXT_DEVICES>();
if(devices.empty())
{
std::cerr << "No devices found!\n";
exit(-1);
}
cl::Device clDevice = devices[0];
cl::CommandQueue clQueue = cl::CommandQueue(clContext, clDevice, 0, 0);
cl::Program program(clContext, cl::Program::Sources(1,
std::make_pair(kernel.c_str(), kernel.size())));
program.build(devices);
cl::Kernel kernel(program, "apply");
//this introduces the error
kernel.setArg(0, genBuffer(clContext, std::vector<cl_float>(100));
//the error is triggered here
clQueue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(100), cl::NullRange);
}
Upvotes: 4
Views: 2069
Reputation: 9547
the problem was that I attached a buffer to a kernel assuming the kernel would retain the buffer. Then I destructed all referencing cl::Buffer
/Memory
objects which caused the OpenCL implementation to delete the buffer.
After running my program through valgrind I noticed that the opencl.so
accessed memory of an object that was freed earlier in a cl::~Buffer
subroutine. Reading up on clSetKernelArg
I noticed:
Users may not rely on a kernel object to retain objects specified as argument values to the kernel.
The undeterministic behaviour clearly is a result of the driver accessing a freed memory area thereby entering UB land.
#include <CL/cl.hpp>
#include <vector>
#include <iostream>
#define STRINGIFY(x) #x
std::string kernel = STRINGIFY(
__kernel void apply(__global const float *param1)
{
}
);
template <class T>
cl::Buffer genBuffer(const cl::Context &context, const std::vector<T> &data,
cl_mem_flags flags = CL_MEM_READ_ONLY)
{
return cl::Buffer(context, flags | CL_MEM_COPY_HOST_PTR,
data.size() * sizeof(data[0]),
const_cast<T*>(&data[0]));
}
int main()
{
std::vector<cl::Platform> clPlatforms;
cl::Platform::get(&clPlatforms);
cl_context_properties props[] = {
CL_CONTEXT_PLATFORM, (cl_context_properties)clPlatforms[0](),
0};
cl::Context clContext = cl::Context(CL_DEVICE_TYPE_GPU, props);
std::vector<cl::Device> devices = clContext.getInfo<CL_CONTEXT_DEVICES>();
if(devices.empty())
{
std::cerr << "No devices found!\n";
exit(-1);
}
cl::Device clDevice = devices[0];
cl::CommandQueue clQueue = cl::CommandQueue(clContext, clDevice, 0, 0);
cl::Program program(clContext, cl::Program::Sources(1,
std::make_pair(kernel.c_str(), kernel.size())));
program.build(devices);
cl::Kernel kernel(program, "apply");
//this version triggers the error
//kernel.setArg(0, genBuffer(clContext, std::vector<cl_float>(100));
//This is how it is done correctly
cl::Buffer buffer = genBuffer(clContext, std::vector<cl_float>(100));
kernel.setArg(0, buffer);
clQueue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(100), cl::NullRange);
}
Upvotes: 2