Reputation: 32111
I've trained 3 models and am now running code that loads each of the 3 checkpoints in sequence and runs predictions using them. I'm using the GPU.
When the first model is loaded it pre-allocates the entire GPU memory (which I want for working through the first batch of data). But it doesn't unload memory when it's finished. When the second model is loaded, using both tf.reset_default_graph()
and with tf.Graph().as_default()
the GPU memory still is fully consumed from the first model, and the second model is then starved of memory.
Is there a way to resolve this, other than using Python subprocesses or multiprocessing to work around the problem (the only solution I've found on via google searches)?
Upvotes: 84
Views: 171601
Reputation: 11
I think, thant Oliver Wilken's user (Jun 30, 2017 at 8:30) is the best solution. But in tensorflow 2 there is eager execution instead of sessions. So you just need to import all the tensorflow and keras stuff inside the run_tensorflow(...) function.
Upvotes: 0
Reputation: 87
I was able to solve an OOM error just now with the garbage collector.
import gc
gc.collect()
model.evaluate(x1, y1)
gc.collect()
model.evaluate(x2, y2)
gc.collect()
etc.
Based on what Yaroslav Bulatov said in their answer (that tf deallocates GPU memory when the object is destroyed), I surmised that it could just be that the garbage collector hadn't run yet. Forcing it to collect freed me up, so that might be a good way to go.
Upvotes: 8
Reputation: 324
To free my resources, I use:
import os, signal
os.kill(os.getpid(), signal.SIGKILL)
Upvotes: 0
Reputation: 465
I have trained my models in a for loop for different parameters when I got this error after 120 models trained. Afterwards I could not even train a simple model if I did not kill the kernel. I was able to solve my issue by adding the following line before building the model:
tf.keras.backend.clear_session()
(see https://www.tensorflow.org/api_docs/python/tf/keras/backend/clear_session)
Upvotes: 1
Reputation: 412
I use numba to release GPU. With TensorFlow, I cannot find an effective method.
import tensorflow as tf
from numba import cuda
a = tf.constant([1.0,2.0,3.0],shape=[3],name='a')
b = tf.constant([1.0,2.0,3.0],shape=[3],name='b')
with tf.device('/gpu:1'):
c = a+b
TF_CONFIG = tf.ConfigProto(
gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.1),
allow_soft_placement=True)
sess = tf.Session(config=TF_CONFIG)
sess.run(tf.global_variables_initializer())
i=1
while(i<1000):
i=i+1
print(sess.run(c))
sess.close() # if don't use numba,the gpu can't be released
cuda.select_device(1)
cuda.close()
with tf.device('/gpu:1'):
c = a+b
TF_CONFIG = tf.ConfigProto(
gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.5),
allow_soft_placement=True)
sess = tf.Session(config=TF_CONFIG)
sess.run(tf.global_variables_initializer())
while(1):
print(sess.run(c))
Upvotes: 9
Reputation: 397
I am figuring out which option is better in the Jupyter Notebook. Jupyter Notebook occupies the GPU memory permanently even a deep learning application is completed. It usually incurs the GPU Fan ERROR that is a big headache. In this condition, I have to reset nvidia_uvm and reboot the linux system regularly. I conclude the following two options can remove the headache of GPU Fan Error but want to know which is better.
Environment:
First Option
Put the following code at the end of the cell. The kernel immediately ended upon the application runtime is completed. But it is not much elegant. Juputer will pop up a message for the died ended kernel.
import os
pid = os.getpid()
!kill -9 $pid
Section Option
The following code can also end the kernel with Jupyter Notebook. I do not know whether numba is secure. Nvidia prefers the "0" GPU that is the most used GPU by personal developer (not server guys). However, both Neil G and mradul dubey have had the response: This leaves the GPU in a bad state.
from numba import cuda
cuda.select_device(0)
cuda.close()
It seems that the second option is more elegant. Can some one confirm which is the best choice?
Notes:
It is not such the problem to automatically release the GPU memory in the environment of Anaconda by direct executing "$ python abc.py". However, I sometimes need to use Jyputer Notebook to handle .ipynb application.
Upvotes: 3
Reputation: 491
You can use numba library to release all the gpu memory
pip install numba
from numba import cuda
device = cuda.get_current_device()
device.reset()
This will release all the memory
Upvotes: 42
Reputation: 687
Now there seem to be two ways to resolve the iterative training model or if you use future multipleprocess pool to serve the model training, where the process in the pool will not be killed if the future finished. You can apply two methods in the training process to release GPU memory meanwhile you wish to preserve the main process.
Here is a helper function using multiprocess.Process which can open a new process to run your python written function and reture value instead of using Subprocess,
# open a new process to run function
def process_run(func, *args):
def wrapper_func(queue, *args):
try:
logger.info('run with process id: {}'.format(os.getpid()))
result = func(*args)
error = None
except Exception:
result = None
ex_type, ex_value, tb = sys.exc_info()
error = ex_type, ex_value,''.join(traceback.format_tb(tb))
queue.put((result, error))
def process(*args):
queue = Queue()
p = Process(target = wrapper_func, args = [queue] + list(args))
p.start()
result, error = queue.get()
p.join()
return result, error
result, error = process(*args)
return result, error
Upvotes: 5
Reputation: 57973
GPU memory allocated by tensors is released (back into TensorFlow memory pool) as soon as the tensor is not needed anymore (before the .run call terminates). GPU memory allocated for variables is released when variable containers are destroyed. In case of DirectSession (ie, sess=tf.Session("")) it is when session is closed or explicitly reset (added in 62c159ff)
Upvotes: 1
Reputation: 2714
A git issue from June 2016 (https://github.com/tensorflow/tensorflow/issues/1727) indicates that there is the following problem:
currently the Allocator in the GPUDevice belongs to the ProcessState, which is essentially a global singleton. The first session using GPU initializes it, and frees itself when the process shuts down.
Thus the only workaround would be to use processes and shut them down after the computation.
Example Code:
import tensorflow as tf
import multiprocessing
import numpy as np
def run_tensorflow():
n_input = 10000
n_classes = 1000
# Create model
def multilayer_perceptron(x, weight):
# Hidden layer with RELU activation
layer_1 = tf.matmul(x, weight)
return layer_1
# Store layers weight & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
pred = multilayer_perceptron(x, weights)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in range(100):
batch_x = np.random.rand(10, 10000)
batch_y = np.random.rand(10, 1000)
sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})
print "finished doing stuff with tensorflow!"
if __name__ == "__main__":
# option 1: execute code with extra process
p = multiprocessing.Process(target=run_tensorflow)
p.start()
p.join()
# wait until user presses enter key
raw_input()
# option 2: just execute the function
run_tensorflow()
# wait until user presses enter key
raw_input()
So if you would call the function run_tensorflow()
within a process you created and shut the process down (option 1), the memory is freed. If you just run run_tensorflow()
(option 2) the memory is not freed after the function call.
Upvotes: 39