DH K
DH K

Reputation: 9

What is difference between setting host and executing function?

I am trying to make a convolution Image using Opencl.

__kernel void convolution_read4(__global uchar *in1, __global uchar* in2,
__constant float* mask, int  height,  int  width,  int  kernelSize,
__local float* lMem, int  localHeight, int  localWidth)
{
    convolution(in1, in2, mask, height, width, kernelSize, lMem, localHeight, localWidth);
    convolution(in2, in1, mask, height, width, kernelSize, lMem, localHeight, localWidth);
    convolution(in1, in2, mask, height, width, kernelSize, lMem, localHeight, localWidth);
}

Upper code executes same function 3 times.

    err = kernel.setArg(0, d_inputImage);
    err |= kernel.setArg(1, d_outputImage);
    err |= kernel.setArg(2, d_filter);
    err |= kernel.setArg(3, Height);
    err |= kernel.setArg(4, Width);     
    err |= kernel.setArg(5, kernelSize);
    err |= kernel.setArg(6, localSize, NULL);
    err |= kernel.setArg(7, localHeight);
    err |= kernel.setArg(8, localWidth);        

    int totalWorkItemX = roundUp(Width - paddingPixels, wgWidth);
    int totalWorkItemY = roundUp(Height - paddingPixels , wgHeight);

    cl::NDRange globalws(totalWorkItemX, totalWorkItemY);
    cl::NDRange localws(wgWidth, wgHeight);

    err = queue.enqueueNDRangeKernel(kernel, cl::NullRange,
        globalws, localws, NULL, NULL);

    err = kernel.setArg(1, d_inputImage);
    err |= kernel.setArg(0, d_outputImage);
    err = queue.enqueueNDRangeKernel(kernel, cl::NullRange,
        globalws, localws, NULL, NULL);

    err = kernel.setArg(0, d_inputImage);
    err |= kernel.setArg(1, d_outputImage);
    err = queue.enqueueNDRangeKernel(kernel, cl::NullRange,
        globalws, localws, NULL, NULL);

    queue.finish();

this code also execute same function "Convolution", but kernel code was changed like that.

__kernel void convolution_read4(__global uchar *in1, __global uchar* in2,
    __constant float* mask, int  height,  int  width,  int  kernelSize,
    __local float* lMem, int  localHeight, int  localWidth)
    {
        convolution(in1, in2, mask, height, width, kernelSize, lMem, localHeight, localWidth);                  
    }

I think this two code is same code. but first code drew a wrong output. I don't know what is difference between this two.

Upvotes: 0

Views: 58

Answers (1)

jprice
jprice

Reputation: 9925

Your convolution function presumably takes a whole input image from global memory, and produces an entire output image in global memory. The difference between calling this function three times in a single kernel invocation and one time from three separate kernel invocations is that writes to global memory by one work-item are not visible to other work-items in the same kernel invocation. This means that during the second call to convolution in your first example, work-items will be reading stale values, and not seeing the output of the first call to this function.

OpenCL does not provide any means to synchronise global memory across an entire kernel invocation. You can synchronise memory within a work-group using the barrier function, which might allow you to implement your algorithm in a single kernel invocation, with some modifications.

Upvotes: 2

Related Questions