CUDA Segmentation fault in threads with no CUDA code

Question

I have this code:

__global__ void testCuda() {}

void wrapperLock()
{
    std::lock_guard lock(mutexCudaExecution);

    // changing this value to 20000 does NOT trigger "Segmentation fault"
    usleep(5000);
    runCuda();
}

void runCuda()
{
    testCuda<<<1, 1>>>();
    cudaDeviceSynchronize();
}

When these functions are executed from around 20 threads then I get Segmentation fault. As written in the comment, changing the value in usleep() to 20000 works fine.

Is there an issue with CUDA and threads?
It looks to me like CUDA needs a bit of time to recover when an execution finished even when there was nothing to do.

Peter VARGA · Accepted Answer

UPDATE:

According to http://docs.nvidia.com/cuda/cuda-c-programming-guide/#um-gpu-exclusive the problem was a concurrent access to the Unified Memory I am using. I had to wrap the CUDA kernel calls and access to the Unified Memory with a std::lock_guard and now the program runs for 4 days under heavy thread load without any problems.

I have to call in each thread - as suggested by Marco & Robert - cudaSetDevice otherwise it crashes again.

CUDA Segmentation fault in threads with no CUDA code

Answers (2)

Related Questions