Reputation: 444
During inference, when the models are being loaded, Cuda throws InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory.
I am performing inference on a machine with 6GB of VRAM. A few days back, the machine was able to perform the tasks, but now I am frequently getting these messages. Restarting the device sometimes does help, but is not a viable solution. I have checked through nvidia-smi, but it is also showing only about 500 MB of VRam being used and I was not able to see any spike in memory usage when tensorflow was trying to load the models.
I am currently using tensorflow 1.14.0 and python 3.7.4
Upvotes: 0
Views: 3555
Reputation: 73
I am using Tensorflow 2.3.0 on a remote server. My code was working fine, but suddenly the server gets disconnected from the network, and my training stopped. When I re-run the code I got the same issue you got. So I guess this problem is related to GPU being busy in something not existing anymore. Clearing the session as the comment said is enough to solve the problem (I also believe restarting the machine can also fix the problem but I did not get the chance to try this solution).
for tensorflow 2.3 use tf.keras.backend.clear_session()
it solve the issue
Upvotes: 2