Reputation: 5186
I have this code:
__global__ void testCuda() {}
void wrapperLock()
{
std::lock_guard<std::mutex> lock(mutexCudaExecution);
// changing this value to 20000 does NOT trigger "Segmentation fault"
usleep(5000);
runCuda();
}
void runCuda()
{
testCuda<<<1, 1>>>();
cudaDeviceSynchronize();
}
When these functions are executed from around 20 threads then I get Segmentation fault
. As written in the comment, changing the value in usleep()
to 20000 works fine.
Is there an issue with CUDA and threads?
It looks to me like CUDA needs a bit of time to recover when an execution finished even when there was nothing to do.
Upvotes: 0
Views: 862
Reputation: 5186
UPDATE:
According to http://docs.nvidia.com/cuda/cuda-c-programming-guide/#um-gpu-exclusive the problem was a concurrent access to the Unified Memory I am using. I had to wrap the CUDA kernel calls and access to the Unified Memory with a std::lock_guard
and now the program runs for 4 days under heavy thread load without any problems.
I have to call in each thread - as suggested by Marco & Robert - cudaSetDevice
otherwise it crashes again.
Upvotes: 0
Reputation: 43662
Using a single CUDA context, multiple host threads should either delegate their CUDA work to a context-owner thread (similar to a worker thread) or bind the context with cuCtxSetCurrent (driver API) or cudaSetDevice in order to not overwrite the context resources.
Upvotes: 3