Iterative image processing in CUDA

Question

I have written a CUDA kernel to process an image. But depending on the output of the processed image, I have to call the kernel again, to re-tune the image. For example, let us consider an image having 9 pixels

1 2 3
4 5 6
7 8 9

Suppose that, depending on its neighboring values, the value 9 changes to 10. Since the value has changed, I have to re-process the new image, with the same kernel.

1 2 3
4 5 6
7 8 10

I have already written the algorithm to process the image in a single iteration. The way I'm planning to implement the iterations in CUDA is the following:

__global__ void process_image_GPU(unsigned int *d_input, unsigned int *d_output, int dataH, int dataW, unsigned int *val) {

     __shared__ unsigned int sh_map[TOTAL_WIDTH][TOTAL_WIDTH];
     // Do processing
     // If during processing, anywhere any thread changes the value of the image call
            { atomicAdd(val, 1); }

}
int main(int argc, char *argv[]) {
    // Allocate d_input, d_output and call cudaMemcpy
    unsigned int *x, *val;
    x = (unsigned int *)malloc(sizeof(unsigned int));
    x[0] = 0;
    cudaMalloc((void **)&val, sizeof(unsigned int));
    cudaMemcpy((void *)val, (void *)x, sizeof(unsigned int), cudaMemcpyHostToDevice);
    process_image_GPU<<>>(d_input, d_output, rows, cols, val);
    cudaMemcpy((void *)x, (void *)val, sizeof(unsigned int), cudaMemcpyDeviceToHost);
    if(x != 0) 
        // Call the kernel again
}

Is it the only way to do this? Is there any other efficient way to implement the same?

Thanks a lot for your time.

Vitality · Accepted Answer

I hazard an answer, despite the almost vanishing information you provided. Hope it helps.

From what you have said, you have already set up an updating rule for your pixels, based on the value of the adjacent pixels. Let x^(k)_ij the value of the pixel number ij at iteration k and let

x^(k+1)_ij = f(x^(k)_(i-1)j, x^(k)_ij, x^(k)_(i+1)j, x^(k)_i(j-1), x^(k)_i(j+1))

I'm assuming the typical stencil-based updating rule, but of course other rules would be possible.

At this point, you have to set up a stopping rule, namely, a rule that indicates if your algorithm has reached convergence. For example, you could evaluate the norm of the difference between the two images at steps k+1 and k.

Once formulated the problem in this way, I would say that you have the following two possibilities:

Rouy-Tourin-like scheme: all the computational pixels are updated in a brute-force way "simultaneously" until convergence is reached;
Fast sweeping method: the computational grid is swept (selective update) along a prefixed number of directions until convergence is reached;

Depending on the kind of problem you are dealing with, I would say that you have the additionl possibility:

Fast iterative method: the computational pixels are selectively updated with the aid of a heap structure.

All the above methods have been compared, for the solution of the eikonal equation, here.

Of course, you will need to show converngence of the above computational schemes for the particular problem of our interest.

Iterative image processing in CUDA

Answers (1)

Related Questions