mkuse
mkuse

Reputation: 2488

Cuda Kernel Fails to launch

Here is my code. I have an array of (x,y) pairs. I want to calculate for each co-ordinate the farthest point.

#define GPUERRCHK(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
   if (code != cudaSuccess)
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

__device__ float computeDist( float x1, float y1, float x2, float y2 )
{
    float delx = x2 - x1;
    float dely = y2 - y1;
    return sqrt( delx*delx + dely*dely );
}

__global__ void kernel( float * x, float * y, float * dev_dist_sum, int N )
{
    int tid = blockIdx.x*gridDim.x + threadIdx.x;
    float a = x[tid];  //............(alpha)
    float b = y[tid];  //............(beta)
    if( tid < N )
    {
    float maxDist = -1;
    for( int k=0 ; k<N ; k++ )
    {
        //float dist = computeDist( x[tid], y[tid], x[k], y[k] ); //....(gamma)
        float dist = computeDist( a, b, x[k], y[k] );             //....(delta)
        if( dist > maxDist )
        maxDist = dist; 
    }
    dev_dist_sum[tid] = maxDist;
    }
}

int main()
{
.
.

    kernel<<<(N+31)/32,32>>>( dev_x, dev_y, dev_dist_sum, N );
    GPUERRCHK( cudaPeekAtLastError() );
    GPUERRCHK( cudaDeviceSynchronize() );

.
.

}

I have a NVidia GeForce 420M. I have verified that cuda works with it on my computer. When I run the above mentioned code for N = 50000, the kernel fails to launch throwing out the error message "unspecified error message". However it seems to work fine for a smaller value like 10000.

Also, if I comment out alpha, beta, delta (see marking in the code) and uncomment gamma, the code works even for a large value of N like 50000 or 100000.

I want to use alpha and beta so as to reduce memory traffic by use of thread memory more instead of global memory.

How do I sort this issue?

Upvotes: 0

Views: 571

Answers (1)

Anirudh Kaushik
Anirudh Kaushik

Reputation: 181

@mkuse. gridDim can be visualized as a 2-D spatial arrangement of thread blocks in a grid and blockDim is a 3-D spatial arrangements of threads. For instance, dim3 gridDim(2,3,1) means 2 thread blocks in the x direction and 3 thread blocks in the y direction. The maximum you can go is 65536 = 2^16. dim3 blockDim(32,16,1) is at the thread granularity. 32 threads in the x direction and 16 threads in the y direction making up for 512 threads in total. You can access each thread with a thread id. However since you have multiple blocks, you would have to identify threads with the respective blockdims and griddims.

Upvotes: 1

Related Questions