Reputation: 16503
I created a few summary ops throughout my graph like so:
tf.summary.scalar('cross_entropy', cross_entropy)
tf.summary.scalar('accuracy', accuracy)
and of course merged and got a writer:
sess = tf.InteractiveSession()
summaries = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(TENSORBOARD_TRAINING_DIR, sess.graph)
tf.global_variables_initializer().run()
and I write these in each training iteration:
summary, acc = sess.run([summaries, accuracy], feed_dict={...})
train_writer.add_summary(summary, i)
when I load the tensorboard, I get some weird results:
this is weird for a couple reasons:
I did check - there are a few previous event files in my training summaries folder:
$ ls /tmp/tv_train/
events.out.tfevents.1517210066.xxxxxxx.local
events.out.tfevents.1517210097.xxxxxxx.local
...
events.out.tfevents.1517210392.xxxxxxx.local
I think I must have restarted the train loop at some point, causing there to be multiple summaries logged at (0, 1, etc) indices.
How can I append to old training logs? Can I point my writer to a specific tfevents file to "start back where I left off"?
Upvotes: 4
Views: 1790
Reputation: 19123
You can't (easily) reopen and "append" to an existing events file, but that's not necessary.
Tensorboard will display sequential event files just fine, as long as the step value in the records is consistent.
When you save a summary, you specify a step
value, which indicates at which point on the x
axis the summary should be plotted.
The graph goes "back in time" because at every new run you restart the step counter from 0. To have it consistent in multiple runs, you should define a global_step
variable that is saved to the checkpoint when you save the network. This way, when you restore the network in the next training run, your global step will pick up from where it left and your graphs will not look weird anymore.
Upvotes: 3