Reputation: 9
I am trying to make a convolution Image using Opencl.
__kernel void convolution_read4(__global uchar *in1, __global uchar* in2,
__constant float* mask, int height, int width, int kernelSize,
__local float* lMem, int localHeight, int localWidth)
{
convolution(in1, in2, mask, height, width, kernelSize, lMem, localHeight, localWidth);
convolution(in2, in1, mask, height, width, kernelSize, lMem, localHeight, localWidth);
convolution(in1, in2, mask, height, width, kernelSize, lMem, localHeight, localWidth);
}
Upper code executes same function 3 times.
err = kernel.setArg(0, d_inputImage);
err |= kernel.setArg(1, d_outputImage);
err |= kernel.setArg(2, d_filter);
err |= kernel.setArg(3, Height);
err |= kernel.setArg(4, Width);
err |= kernel.setArg(5, kernelSize);
err |= kernel.setArg(6, localSize, NULL);
err |= kernel.setArg(7, localHeight);
err |= kernel.setArg(8, localWidth);
int totalWorkItemX = roundUp(Width - paddingPixels, wgWidth);
int totalWorkItemY = roundUp(Height - paddingPixels , wgHeight);
cl::NDRange globalws(totalWorkItemX, totalWorkItemY);
cl::NDRange localws(wgWidth, wgHeight);
err = queue.enqueueNDRangeKernel(kernel, cl::NullRange,
globalws, localws, NULL, NULL);
err = kernel.setArg(1, d_inputImage);
err |= kernel.setArg(0, d_outputImage);
err = queue.enqueueNDRangeKernel(kernel, cl::NullRange,
globalws, localws, NULL, NULL);
err = kernel.setArg(0, d_inputImage);
err |= kernel.setArg(1, d_outputImage);
err = queue.enqueueNDRangeKernel(kernel, cl::NullRange,
globalws, localws, NULL, NULL);
queue.finish();
this code also execute same function "Convolution", but kernel code was changed like that.
__kernel void convolution_read4(__global uchar *in1, __global uchar* in2,
__constant float* mask, int height, int width, int kernelSize,
__local float* lMem, int localHeight, int localWidth)
{
convolution(in1, in2, mask, height, width, kernelSize, lMem, localHeight, localWidth);
}
I think this two code is same code. but first code drew a wrong output. I don't know what is difference between this two.
Upvotes: 0
Views: 58
Reputation: 9925
Your convolution
function presumably takes a whole input image from global memory, and produces an entire output image in global memory. The difference between calling this function three times in a single kernel invocation and one time from three separate kernel invocations is that writes to global memory by one work-item are not visible to other work-items in the same kernel invocation. This means that during the second call to convolution
in your first example, work-items will be reading stale values, and not seeing the output of the first call to this function.
OpenCL does not provide any means to synchronise global memory across an entire kernel invocation. You can synchronise memory within a work-group using the barrier
function, which might allow you to implement your algorithm in a single kernel invocation, with some modifications.
Upvotes: 2