kalonymus
kalonymus

Reputation: 27

plus equal (+=) operator in pycuda

I would like to implement a variant of convolution in pycuda.

For simplicity, I'll show rectangle kernel of the interpolation. The standard convolution can be applied as following:

import pycuda.autoinit
import pycuda.driver as drv
import numpy as np
from pycuda.compiler import SourceModule

mod = SourceModule("""
#include <stdio.h>
__global__ void func(float *dest, float *a)
{
  const int img_size = 64;
  const int kernel_size = 3;
  const int kernel_size_half = kernel_size/2;
  const int tx = blockIdx.x * blockDim.x + threadIdx.x;
  const int ty = blockIdx.y * blockDim.y + threadIdx.y;
  int tx_kernel;

  tx_kernel = tx - kernel_size_half;
  for (int idx=-kernel_size_half; idx <= kernel_size_half; idx++) 
  {
      tx_kernel = tx + idx ;
      if ((tx_kernel < 0) || (tx_kernel > img_size-1))
        continue;
      dest[ty * img_size + tx] +=  a[ty * img_size + tx_kernel] / ((float) kernel_size);
  }
}
""")

Instead of calculating the current position wrt neighbours, I would like to do the opposite, to add the value of the current pixel to the neighbours.

I.e:

to change the line:

dest[ty * img_size + tx] +=  a[ty * img_size + tx_kernel] / ((float) kernel_size);

to:

dest[ty * img_size + tx_kernel] +=  a[ty * img_size + tx] / ((float) kernel_size);

However, The first works fine but the second is not, it fails by updating the neighbours. Is there a way to bypass it?

Note: I simplified the question to focus on what I need, the general problem is to use a different convolution kernel for each pixel instead of same one as I asked in the question.

Upvotes: 1

Views: 88

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 152123

to change the line:

dest[ty * img_size + tx] +=  a[ty * img_size + tx_kernel] / ((float) kernel_size);

to:

dest[ty * img_size + tx_kernel] +=  a[ty * img_size + tx] / ((float) kernel_size);

However, The first works fine but the second is not, it fails by updating the neighbours. Is there a way to bypass it?

The first method is preferred from a performance perspective. However if you wish to "update the neighbors" then it should be possible to recast the second operation as:

atomicAdd(&(dest[ty * img_size + tx_kernel]), a[ty * img_size + tx] / ((float) kernel_size));

Upvotes: 2

Related Questions