Reputation: 463
I am using Visual Studio 2012 and got some kernels who crashed when executing the code using CUDA Debugging. Some other Kernels where running the same code without any problem (on different generated numbers / data). I don't know if the kernels are crashing when running the program without CUDA Debugging as I do not get any error.
The error was:
CUDA Debugger detected data stack overflow on 120 threads.
First thread:
blockIdx = {2,0,0}
threadIdx = {1,0,0}
StackPointer = 0x00ffe9d0
StackLimit = 0x00ffea40
By looking in the documentation I found how to increase stack size (I also needed to increase heap size):
//Increase memory limits
size_t size_heap, size_stack;
cudaDeviceSetLimit(cudaLimitMallocHeapSize,20000000*sizeof(double));
cudaDeviceSetLimit(cudaLimitStackSize,12928);
cudaDeviceGetLimit(&size_heap, cudaLimitMallocHeapSize);
cudaDeviceGetLimit(&size_stack, cudaLimitStackSize);
printf("Heap size found to be %d; Stack size found to be %d\n",(int)size_heap,(int)size_stack);
The default stack size was 6464, so I wanted to double it to see if there is any improvement. When I launched the program using the standard windows debugger, the stack size returned by cudaDeviceGetLimit(&size_stack, cudaLimitStackSize)
was 12928 as expected.
However, when I launch the program using the CUDA debugger, it reports a stack size of 1024, not 12928. Why is that ?
Upvotes: 3
Views: 1111
Reputation: 463
It seems it was just a bug, I updated to CUDA 7.0 Release Candidate and the stack allocation is working well now.
If you have the same problem, update to latest drivers / toolkit. CUDA 7.0 RC is only available to CUDA Registered Developers, you need to register on their website.
Upvotes: 2