Reputation: 2488
Here is my code. I have an array of (x,y) pairs. I want to calculate for each co-ordinate the farthest point.
#define GPUERRCHK(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}
__device__ float computeDist( float x1, float y1, float x2, float y2 )
{
float delx = x2 - x1;
float dely = y2 - y1;
return sqrt( delx*delx + dely*dely );
}
__global__ void kernel( float * x, float * y, float * dev_dist_sum, int N )
{
int tid = blockIdx.x*gridDim.x + threadIdx.x;
float a = x[tid]; //............(alpha)
float b = y[tid]; //............(beta)
if( tid < N )
{
float maxDist = -1;
for( int k=0 ; k<N ; k++ )
{
//float dist = computeDist( x[tid], y[tid], x[k], y[k] ); //....(gamma)
float dist = computeDist( a, b, x[k], y[k] ); //....(delta)
if( dist > maxDist )
maxDist = dist;
}
dev_dist_sum[tid] = maxDist;
}
}
int main()
{
.
.
kernel<<<(N+31)/32,32>>>( dev_x, dev_y, dev_dist_sum, N );
GPUERRCHK( cudaPeekAtLastError() );
GPUERRCHK( cudaDeviceSynchronize() );
.
.
}
I have a NVidia GeForce 420M. I have verified that cuda works with it on my computer. When I run the above mentioned code for N = 50000, the kernel fails to launch throwing out the error message "unspecified error message". However it seems to work fine for a smaller value like 10000.
Also, if I comment out alpha, beta, delta (see marking in the code) and uncomment gamma, the code works even for a large value of N like 50000 or 100000.
I want to use alpha and beta so as to reduce memory traffic by use of thread memory more instead of global memory.
How do I sort this issue?
Upvotes: 0
Views: 571
Reputation: 181
@mkuse. gridDim can be visualized as a 2-D spatial arrangement of thread blocks in a grid and blockDim is a 3-D spatial arrangements of threads. For instance, dim3 gridDim(2,3,1) means 2 thread blocks in the x direction and 3 thread blocks in the y direction. The maximum you can go is 65536 = 2^16. dim3 blockDim(32,16,1) is at the thread granularity. 32 threads in the x direction and 16 threads in the y direction making up for 512 threads in total. You can access each thread with a thread id. However since you have multiple blocks, you would have to identify threads with the respective blockdims and griddims.
Upvotes: 1