kingaj
kingaj

Reputation: 791

Tensorflow GPU application crashes Jupyter notebook kernel

We are running Tensorflow applications on GPU using multiple Jupyter notebooks. Every once in a while one of the runs crashes the notebook, with the simple notification that "The kernel has crashed...".

When we placed the code into a python .py file, the stderr output was

F tensorflow/core/kernels/conv_ops_3d.cc:369] Check failed:   stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted

In another run the stderr reported:

F tensorflow/core/common_runtime/gpu/gpu_util.cc:296] GPU->CPU Memcpy failed

The problem is that the tensorflow applications are grabbing a lot of memory. In Linux you can run top to see what is going on. On our machine we saw that each tensorflow process was grabbing 0.55t!

When you run the process inside a Jupyter notebook and do not shutdown the notebook, the notebook does not release the memory. At some point you will run a process that cannot access memory and it will die. If you are running inside a notebook it will only tell you that the kernel has died.

Can anyone help with this?

Upvotes: 0

Views: 1893

Answers (1)

kingaj
kingaj

Reputation: 791

One suggestion is to place the following snippet before you import tensorflow:

import os
os.environ["CUDA_VISIBLE_DEVICES"]="-1"

Added after @ Nicolas comment

Yes this disables GPU! Which is not what is wanted.

Upvotes: 1

Related Questions