CUDA parallelizing a dependent 2D array

Question

I have a sample loop of following form. Notice that my psi[i][j] is dependent on psi[i+1][j], psi[i-1][j], psi[i][j+1] and psi[i][j-1] and I have to calculate psi for inner matrix only. Now I tried writing this in CUDA but the results are not same as sequential.

for(i=1;i<=leni-2;i++)
for(j=1;j<=lenj-2;j++){
    psi[i][j]=(omega[i][j]*(dx*dx)*(dy*dy)+(psi[i+1][j]+psi[i-1][j])*(dy*dy)+(psi[i][j+1]+psi[i][j-1])*(dx*dx) )/(2.0*(dx*dx)+2.0*(dy*dy));
}

Here's my CUDA format.

//KERNEL
__global__ void ComputePsi(double *psi, double *omega, int imax, int jmax)
{
int x = blockIdx.x;
int y = blockIdx.y;
int i = (jmax*x) + y;
double beta = 1;
double dx=(double)30/(imax-1);
double dy=(double)1/(jmax-1);

if((i)%jmax!=0 && (i+1)%jmax!=0 &&  i>=jmax && i>>(dev_psi, dev_omega, leni, lenj);

Where psi[leni][lenj] and omega[leni][lenj] and double arrays.

The problem is sequential and CUDA codes are giving different results. Is there any modification needed in the code?

OnWhenReady · Accepted Answer

You are working in global memory and you are changing psi entries while other threads might need the old values. Just store the values of the new iteration in a separate variable. But keep in mind that you have to swap the variables after each iteration !! A more sophisticated approach would be a solution working with shared memory and spatial domain assignment to the separate threads. Just google for CUDA tutorials for the solving of the heat/diffusion equation and you will get the idea.

CUDA parallelizing a dependent 2D array

Answers (2)

Related Questions