Reputation:
I am trying to train a model with TensorFlow and then obtain a graph of the loss. I am using the below snippet:
for step in range(NUM_BATCHES):
img, lbl = sess.run([batch_images, batch_labels])
_, loss_value = sess.run([train_op, cost], feed_dict={X: img, Y: lbl, p_keep_conv: 0.8, p_keep_hidden: 0.5})
print("Step %d, loss %1.5f" % (step, loss_value))
sys.stdout.flush()
tf.summary.scalar('loss', loss_value)
summary_writer.add_summary(sess.run(tf.summary.merge_all()), step)
When I open the log directory in TensorBoard, I can't see a graph of the loss function against runs. In fact, I don't even see an Events section, but I do have a Scalar section which displays LOSS_xx
values (xx
for each batch).
What am I missing?
Upvotes: 0
Views: 150
Reputation: 29972
The mistake here is that summaries are created with TensorFlow operations, and functions in the tf.summary
namespace such as tf.summary.scalar
also work with tensors. Your code is creating a new, independent summary chart in each iteration, rather than having all scalar loss values as part of the same summary.
To fix this, you must create the summary operations only once, usually before training takes place. With this approach, you only need one run per step:
tf.summary.scalar('loss', cost)
all_summaries = tf.summary.merge_all()
for step in range(NUM_BATCHES):
img, lbl = sess.run([batch_images, batch_labels])
_, loss_value, summary = sess.run([train_op, cost, all_summaries], feed_dict={X: img, Y: lbl, p_keep_conv: 0.8, p_keep_hidden: 0.5})
print("Step %d, loss %1.5f" % (step, loss_value))
sys.stdout.flush()
summary_writer.add_summary(summary, step)
Upvotes: 2