braks
braks

Reputation: 1610

CL_INVALID_ARG_VALUE when setting buffer arguments

I'm writing an OpenCL program, and have found I'm reading a buffer out as all zeroes. Diving into the Intel SDK tracing, I've found that I get a CL_INVALID_ARG_VALUE when SETTING the buffer arguments. (Setting scalar arguments does not produce an error)

I'm using the OpenCL C++ Bindings (cl.hpp).

Since my code is long I've replicated the problem with a test program.

cl::CommandQueue queue(context, devices.front());

cl::Buffer resultsBuf(context, CL_MEM_WRITE_ONLY | CL_MEM_HOST_READ_ONLY, sizeof(cl_short) * 2048);
cl::Buffer inputBuf(context, CL_MEM_READ_ONLY | CL_MEM_HOST_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(cl_uchar) * 2048, input.data());

queue.enqueueWriteBuffer(inputBuf, CL_TRUE, 0, sizeof(cl_uchar) * 2048, input.data());

// Execution of the following two lines produces CL_INVALID_ARG_VALUE for both.
err = kernel.setArg(0, resultsBuf);
err = kernel.setArg(1, inputBuf);

// Execution of the following line produces CL_INVALID_KERNEL_ARGS
err = queue.enqueueTask(kernel);

vector<cl_short> result(2048);
err = queue.enqueueReadBuffer(resultsBuf, CL_TRUE, 0, sizeof(cl_short) * 2048, result.data());

and the Kernel code:

__kernel void myKernel(
    __local short* resultsBuf,
    __local uchar* inputBuf
) {
    for (int i = 0; i < 2048; ++i) {
       resultsBuf[i] = -3; 
    }
}

input is a vector<cl_uchar>(2048) filled with some test data, it isn't used for anything yet. All I'm expecting from this test case is to read back a buffer filled with the value -3.

I've compared my code with other samples I've found online, and nothing jumps out at me as odd, I've tried various little adjustments (like changing the mem flags) and I can't seem to improve the situation.

Is there something I've overlooked about buffers?

(curiously my test program does fill result with a few junk bytes?)

Upvotes: 0

Views: 1254

Answers (1)

doqtor
doqtor

Reputation: 8494

To pass data there and back between host and GPU you have to use global memory. That seems to be done OK on the host side but in your kernel you use __local memory address specifier which as the name suggest is to be used locally within the kernel. Fixed kernel to use __global:

__kernel void myKernel(
    __global short* resultsBuf,
    __global uchar* inputBuf
) {
    for (int i = 0; i < 2048; ++i) {
       resultsBuf[i] = -3; 
    }
}

Upvotes: 3

Related Questions