Reputation: 791
We are running Tensorflow applications on GPU using multiple Jupyter notebooks. Every once in a while one of the runs crashes the notebook, with the simple notification that "The kernel has crashed...".
When we placed the code into a python .py file, the stderr output was
F tensorflow/core/kernels/conv_ops_3d.cc:369] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted
In another run the stderr reported:
F tensorflow/core/common_runtime/gpu/gpu_util.cc:296] GPU->CPU Memcpy failed
The problem is that the tensorflow applications are grabbing a lot of memory. In Linux you can run top
to see what is going on. On our machine we saw that each tensorflow process was grabbing 0.55t
!
When you run the process inside a Jupyter notebook and do not shutdown the notebook, the notebook does not release the memory. At some point you will run a process that cannot access memory and it will die. If you are running inside a notebook it will only tell you that the kernel has died.
Can anyone help with this?
Upvotes: 0
Views: 1893
Reputation: 791
One suggestion is to place the following snippet before you import tensorflow:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
Added after @ Nicolas comment
Yes this disables GPU! Which is not what is wanted.
Upvotes: 1