Peter VARGA
Peter VARGA

Reputation: 5186

CUDA Segmentation fault in threads with no CUDA code

I have this code:

__global__ void testCuda() {}

void wrapperLock()
{
    std::lock_guard<std::mutex> lock(mutexCudaExecution);

    // changing this value to 20000 does NOT trigger "Segmentation fault"
    usleep(5000);
    runCuda();
}

void runCuda()
{
    testCuda<<<1, 1>>>();
    cudaDeviceSynchronize();
}

When these functions are executed from around 20 threads then I get Segmentation fault. As written in the comment, changing the value in usleep() to 20000 works fine.

Is there an issue with CUDA and threads?
It looks to me like CUDA needs a bit of time to recover when an execution finished even when there was nothing to do.

Upvotes: 0

Views: 862

Answers (2)

Peter VARGA
Peter VARGA

Reputation: 5186

UPDATE:

According to http://docs.nvidia.com/cuda/cuda-c-programming-guide/#um-gpu-exclusive the problem was a concurrent access to the Unified Memory I am using. I had to wrap the CUDA kernel calls and access to the Unified Memory with a std::lock_guard and now the program runs for 4 days under heavy thread load without any problems.

I have to call in each thread - as suggested by Marco & Robert - cudaSetDevice otherwise it crashes again.

Upvotes: 0

Marco A.
Marco A.

Reputation: 43662

Using a single CUDA context, multiple host threads should either delegate their CUDA work to a context-owner thread (similar to a worker thread) or bind the context with cuCtxSetCurrent (driver API) or cudaSetDevice in order to not overwrite the context resources.

Upvotes: 3

Related Questions