Reputation: 39
I'm getting this error and I have not found any relatable answer to EXECUTION_FAILED EVEN though training starts but it's pretty slow to suggest that training process in using GPU. In detail if it may help.
Specs I'm Using:
CPU = Core-i7 9th Gen Hexacore
RAM = 16GB
GPU = Nvidia GTX 1660Ti 6-GB
MATLAB = R2018b Version
Code:
options = trainingOptions('sgdm', ...
'MiniBatchSize',32, ...
'MaxEpochs',10, ...
'InitialLearnRate',1e-4, ...
'Shuffle','every-epoch', ...
'ValidationData',augimdsValidation, ...
'ValidationFrequency',3, ...
'Verbose',false, ...
'Plots','training-progress');
try
net.internal.cnngpu.reluForward(1);
catch ME
end
netTransfer = trainNetwork(augimdsTrain,layers,options);
Error Detail:
Warning: The CUDA driver must recompile the GPU libraries because your device is more recent than the
libraries. Recompiling can take several minutes. Learn more.
> In parallel.internal.gpu.selectDevice
In parallel.gpu.GPUDevice.current (line 44)
In gpuDevice (line 23)
In nnet.internal.cnn.util.isGPUCompatible (line 10)
In nnet.internal.cnn.util.GPUShouldBeUsed (line 17)
In nnet.internal.cnn.assembler.setupExecutionEnvironment (line 24)
In trainNetwork>doTrainNetwork (line 171)
In trainNetwork (line 148)
In viperMat (line 45)
Error using trainNetwork (line 150)
Unexpected error calling cuDNN: CUDNN_STATUS_EXECUTION_FAILED.
Upvotes: 1
Views: 1228
Reputation: 39
So, the slow performance was due to bigger batch size passed while training reducing batch size made it faster (But it's still no comparison to python libs). About the error, you can re-execute the code few time to get rid of the error or you can simply write code below to suppress it in start.
warning off parallel:gpu:device:DeviceLibsNeedsRecompiling
Hope it helps to people having similar issue.
Upvotes: 1