Safoora Yousefi
Safoora Yousefi

Reputation: 173

Tensorflow memory leak when building graph in a loop

I noticed this when my grid search for selecting hyper-parameters of a Tensorflow (version 1.12.0) model crashed due to explosion in memory consumption.

Notice that unlike similar-looking question here, I do close the graph and session (using context managers), and I am not adding nodes to the graph in the loop.

I suspected that maybe tensorflow maintains global variables that do not get cleared between iterations, so I called globals() before and after an iteration but did not observe any difference in the set of global variable before and after each iteration.

I made a small example that reproduces the problem. I train a simple MNIST classifier in a loop and plot the memory consumed by the process:

import matplotlib.pyplot as plt
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import psutil
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
process = psutil.Process(os.getpid())

N_REPS = 100
N_ITER = 10
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x_test, y_test = mnist.test.images, mnist.test.labels

# Runs experiment several times.
mem = []
for i in range(N_REPS):
    with tf.Graph().as_default():
        net = tf.contrib.layers.fully_connected(x_test, 200)
        logits = tf.contrib.layers.fully_connected(net, 10, activation_fn=None)
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_test, logits=logits))
        train_op = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(loss)
        init = tf.global_variables_initializer()
        with tf.Session() as sess:
            # training loop.
            sess.run(init)
            for _ in range(N_ITER):
                sess.run(train_op)
    mem.append(process.memory_info().rss)
plt.plot(range(N_REPS), mem)

And the resulting plot looks like this: enter image description here

In my actual project, process memory starts from a couple of hundreds MB (depending on dataset size), and goes up to 64 GB until my system run out of memory. There are things that I tried that slow down the increase, such as using placeholders and feed_dicts instead of relying on convert_to_tensor. But the constant increase is still there, only slower.

Upvotes: 6

Views: 3741

Answers (2)

ug2409
ug2409

Reputation: 344

Try and take the loop inside the session. Don't create the graph and session for every iteration. Every time the graph is created and variable initialized, you are not redefining the old graph but creating new ones leading to memory leaks. I was facing a similar issue and was able to solve this by taking the loop inside the session.

From How not program Tensorflow

  • Be conscious of when you’re creating ops, and only create the ones you need. Try to keep op creation distinct from op execution.
  • Especially if you’re just working with the default graph and running interactively in a regular REPL or a notebook, you can end up with a lot of abandoned ops in your graph. Every time you re-run a notebook cell that defines any graph ops, you aren’t just redefining ops—you’re creating new ones.

Upvotes: 0

BiBi
BiBi

Reputation: 7908

You need to clear the graph after each iteration of your for loop before instantiating a new graph. Adding tf.reset_default_graph() at the end of your for loop should resolve your memory leak issue.

for i in range(N_REPS):
    with tf.Graph().as_default():
        net = tf.contrib.layers.fully_connected(x_test, 200)
        ...
    mem.append(process.memory_info().rss)
    tf.reset_default_graph()

Upvotes: 2

Related Questions