Reputation: 600
I am doing GPU-accelerated deep learning with Tensorflow, and am experiencing a memory leak (the RAM variety, not on the GPU).
I have narrowed it down, almost beyond all doubt, to the training line
self.sess.run(self.train_step, feed_dict={self.x: trainingdata, self.y_true: traininglabels, self.keepratio: self.training_keep_rate})
If I comment that line, and only that line, out (but still do all my pre-processing and validation/testing and such for a few thousand training batches), the memory leak does not happen.
The leak is on the order of a few GB per hour (I am running Ubuntu, and have 16GB RAM + 16GB swap; the system becomes very laggy and unresponsive after 1-3 hours of running, when about 1/3-1/2 the RAM is used, which is a bit weird to me since I still have lots of RAM and the CPU is mostly free when this happens...)
Here is some of the initializer code (only run once, at the beginning) if it is relevant:
with tf.name_scope('after_final_layer') as scope:
self.layer1 = weights["wc1"]
self.y_conv = network(self.x, weights, biases, self.keepratio)['out']
variable_summaries(self.y_conv)
# Note: Don't add a softmax reducer in the network if you are going to use this
# cross-entropy function
self.cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(self.y_conv, self.y_true, name = "softmax/cross_ent"), name = "reduce_mean")
self.train_step = tf.train.AdamOptimizer(learning_rate, name = "Adam_Optimizer").minimize(self.cross_entropy)
self.prediction = tf.argmax(self.y_conv, 1)
self.correct_prediction = tf.equal(self.prediction, tf.argmax(self.y_true, 1))
self.accuracy = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32))
if tensorboard:
# Merge all the summaries and write them out to the directory below
self.merged = tf.summary.merge_all()
self.my_writer = tf.summary.FileWriter('/home/james/PycharmProjects/AI_Final/my_tensorboard', graph=self.sess.graph)
# self.sess.run(tf.initialize_all_variables()) #old outdated way to do below
tf.global_variables_initializer().run(session=self.sess)
I'm also happy to post all of the network/initialization code, but I think that that is probably irrelevant to this leak.
Am I doing something wrong or have I found a Tensorflow bug? Thanks in advance!
Update: I will likely submit a bug report soon, but I am first trying to verify that I am not bothering them with my own mistakes. I have added
self.sess.graph.finalize()
to the end of my initialization code. As I understand it, it should throw an exception if I am accidentally adding to the graph. No exceptions are thrown. I am using tf version 0.12.0-rc0, np version 1.12.0b1, and Python version 2.7.6. Could those versions be outdated/the problem?
Upvotes: 2
Views: 1156
Reputation: 600
This issue is solved in 1.1. Ignore this page which (at the time of writing) says that the latest stable version is r0.12; 1.1 is the latest stable version. See https://github.com/tensorflow/tensorflow/issues/9590 and https://github.com/tensorflow/tensorflow/issues/9872
Upvotes: 1