user3275666
user3275666

Reputation: 1

Multiple GPU code on Matlab runs for few seconds only

I am running the following MATLAB code on a system with one GTX 1080 and a K80 (with 2 GPUs)

delete(gcp('nocreate'));

parpool('local',2);

spmd

    gpuDevice(labindex+1)

end

reset(gpuDevice(2))

reset(gpuDevice(3))


parfor i=1:100

    SingleGPUMatlabCode(i);

end

The code runs for around a second. When I rerun the code after few seconds. I get the message:

Error using parallel.gpu.CUDADevice/reset
An unexpected error occurred during CUDA execution. The
CUDA error was:
unknown error

Error in CreateDictionary
reset(gpuDevice(2))

I tried increasing TdrDelay, but it did not help.

Upvotes: 0

Views: 170

Answers (1)

Joss Knight
Joss Knight

Reputation: 322

Something in your GPU code is causing an error on the device. Because the code is running asynchronously, this error is not picked up until the next synchronisation point, which is when you run the code again. I would need to see the contents of SingleGPUMatlabCode to know what that error might be. Perhaps there's an allocation failure or an out of bounds access. Errors that aren't correctly handled will get converted to 'unknown error' at the next CUDA operation.

Try adding wait(gpuDevice) inside the loop to identify when the error is occurring.

If either device 2 or 3 are the GTX1080, you may have discovered an issue with MATLAB's restricted support for the Pascal architecture. See https://www.mathworks.com/matlabcentral/answers/309235-can-i-use-my-nvidia-pascal-architecture-gpu-with-matlab-for-gpu-computing

If this is caused by the Windows timeout, you would see a several second screen blackout.

Upvotes: 1

Related Questions