user1816546
user1816546

Reputation: 313

OpenCL passing array to multiple instances of the same kernel

I have a kernel which I would like to run multiple instances of with a different array input each time. The kernel has 2 array inputs,which I shall call A and B, and I would like to vary B.

After a lot of reading I found that the best way to do this is to create multiple queues and run each instance of the kernel inside each queue. So, if I have 5 different array inputs I need to create 5 queues. Here is a rough example of what my code looks like:

// Create memory buffer on device for A and B
cl_mem a_mem_obj = clCreateBuffer(context, CL_MEM_READ_ONLY,...);
cl_mem b_mem_obj[5];

for(int i = 0; i < 5; ++i)
    b_mem_obj[i] = clCreateBuffer(context, CL_MEM_READ_ONLY,...);

// Copy the list A to to memory buffers
for(int i = 0; i < 5; ++i)
    clEnqueueWriteBuffer(queue[i], A, ...);

//Set kernel arguments here

for(int i = 0; i < 5; ++i)
{
    clEnqueueWriteBuffer(queue[i], B[i], ...);
    clSetKernelArgr(kernel, 1, sizeof(cl_mem), b_mem_obj[i]);
    clEnqueueNDRangeKernel(queue[i], kernel, ...);
    clEnqueueReadBuffer(queue[i], output, ...);
}    

for (int i = 0; i < conv1_filters ; i ++) {
    clFlush(queue[i]);
    clFinish(queue[i]);
}

The above works on small matrices, but as soon as I exceed sizes of 100 I get a segmentation fault error. Any advice? I am using one gpu device by the way. Thanks!

Upvotes: 0

Views: 514

Answers (1)

huseyin tugrul buyukisik
huseyin tugrul buyukisik

Reputation: 11910

clSetKernelArgr(kernel, 1, sizeof(cl_mem), b_mem_obj[i]);

is not enqueueuing anything since there is not queue parameter. It happens immediately and invalidates prior arguments set. So either you need to compile N kernels from same kernel string, or you need to assign ALL arguments but just use only necessary ones in each queue(such as passing queue number as parameter so in kernel it knows which argument array to use)

Using many arguments may slow execution and is limited to a small value. Using many kernels from same string, may not slow execution but it should be limited by a bigger value.

ALL:

clSetKernelArgr(kernel, i, sizeof(cl_mem), b_mem_obj[i]);
each memobj different index so no conflict, pick one in kernel using queue index
kernel string grows 100 times, bad

multiple kernels:

clSetKernelArgr(kernel[i], 1, sizeof(cl_mem), b_mem_obj[i]);
so it should be clean to read in kernel code
no extra kernel code

Upvotes: 1

Related Questions