Reputation: 1610
I'm writing an OpenCL program, and have found I'm reading a buffer out as all zeroes. Diving into the Intel SDK tracing, I've found that I get a CL_INVALID_ARG_VALUE when SETTING the buffer arguments. (Setting scalar arguments does not produce an error)
I'm using the OpenCL C++ Bindings (cl.hpp).
Since my code is long I've replicated the problem with a test program.
cl::CommandQueue queue(context, devices.front());
cl::Buffer resultsBuf(context, CL_MEM_WRITE_ONLY | CL_MEM_HOST_READ_ONLY, sizeof(cl_short) * 2048);
cl::Buffer inputBuf(context, CL_MEM_READ_ONLY | CL_MEM_HOST_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(cl_uchar) * 2048, input.data());
queue.enqueueWriteBuffer(inputBuf, CL_TRUE, 0, sizeof(cl_uchar) * 2048, input.data());
// Execution of the following two lines produces CL_INVALID_ARG_VALUE for both.
err = kernel.setArg(0, resultsBuf);
err = kernel.setArg(1, inputBuf);
// Execution of the following line produces CL_INVALID_KERNEL_ARGS
err = queue.enqueueTask(kernel);
vector<cl_short> result(2048);
err = queue.enqueueReadBuffer(resultsBuf, CL_TRUE, 0, sizeof(cl_short) * 2048, result.data());
and the Kernel code:
__kernel void myKernel(
__local short* resultsBuf,
__local uchar* inputBuf
) {
for (int i = 0; i < 2048; ++i) {
resultsBuf[i] = -3;
}
}
input
is a vector<cl_uchar>(2048)
filled with some test data, it isn't used for anything yet. All I'm expecting from this test case is to read back a buffer filled with the value -3.
I've compared my code with other samples I've found online, and nothing jumps out at me as odd, I've tried various little adjustments (like changing the mem flags) and I can't seem to improve the situation.
Is there something I've overlooked about buffers?
(curiously my test program does fill result
with a few junk bytes?)
Upvotes: 0
Views: 1254
Reputation: 8494
To pass data there and back between host and GPU you have to use global memory. That seems to be done OK on the host side but in your kernel you use __local
memory address specifier which as the name suggest is to be used locally within the kernel.
Fixed kernel to use __global
:
__kernel void myKernel(
__global short* resultsBuf,
__global uchar* inputBuf
) {
for (int i = 0; i < 2048; ++i) {
resultsBuf[i] = -3;
}
}
Upvotes: 3