Cuda Kernel Fails to launch

Question

Here is my code. I have an array of (x,y) pairs. I want to calculate for each co-ordinate the farthest point.

#define GPUERRCHK(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
   if (code != cudaSuccess)
   {
      fprintf(stderr,"GPUassert: %s %s %d
", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

__device__ float computeDist( float x1, float y1, float x2, float y2 )
{
    float delx = x2 - x1;
    float dely = y2 - y1;
    return sqrt( delx*delx + dely*dely );
}

__global__ void kernel( float * x, float * y, float * dev_dist_sum, int N )
{
    int tid = blockIdx.x*gridDim.x + threadIdx.x;
    float a = x[tid];  //............(alpha)
    float b = y[tid];  //............(beta)
    if( tid < N )
    {
    float maxDist = -1;
    for( int k=0 ; k maxDist )
        maxDist = dist; 
    }
    dev_dist_sum[tid] = maxDist;
    }
}

int main()
{
.
.

    kernel<<<(N+31)/32,32>>>( dev_x, dev_y, dev_dist_sum, N );
    GPUERRCHK( cudaPeekAtLastError() );
    GPUERRCHK( cudaDeviceSynchronize() );

.
.

}

I have a NVidia GeForce 420M. I have verified that cuda works with it on my computer. When I run the above mentioned code for N = 50000, the kernel fails to launch throwing out the error message "unspecified error message". However it seems to work fine for a smaller value like 10000.

Also, if I comment out alpha, beta, delta (see marking in the code) and uncomment gamma, the code works even for a large value of N like 50000 or 100000.

I want to use alpha and beta so as to reduce memory traffic by use of thread memory more instead of global memory.

How do I sort this issue?

Cuda Kernel Fails to launch

Answers (1)

Related Questions