Reputation: 77
I have been training my Inception ResNet V2 using Tensorflow, and logging the accuracy/loss etc. via TensorBoard.
Now when I resumed training today, almost instantly (in a few global steps) my accuracy bumped from 86% to 97%, when resuming the checkpoint I stopped at earlier.
When looking at the loss plot, it seems to be gradually reducing still, but accuracy had this huge bump. Is there an obvious/logical explanation for this ? I resumed training at epoch 21 (stopped at 20), with 1339 global steps per epoch.
Upvotes: 2
Views: 421
Reputation: 697
As @P-Gn pointed out in an accepted answer, that's because tf.metrics are all streaming metrics by design.
You can reset streaming metrics, or, if you care only about single batch accuracy, you can use simple function:
def non_streaming_accuracy(predictions, labels):
return tf.reduce_mean(tf.cast(tf.equal(predictions, labels), tf.float32))
Upvotes: 0
Reputation: 24581
That is because you are using a streaming accuracy, which accumulates all statistics since the beginning of time -- well, of training time.
Until you stopped training, the streaming accuracy was returning the accuracy averaged since the beginning.
When you resumed training, the streaming accuracy op has been reset and now outputs the mean accuracy since you resumed training. It is much higher because it does not average over the earlier, lower values of accuracy, when your model was weak.
I actually posted something yesterday on how to reset streaming metrics from time to time to avoid this continuous accumulation.
Upvotes: 3
Reputation: 3294
I think the contrib accuracy is the problem.
https://github.com/tensorflow/tensorflow/issues/9498.
It does not reset, so you accuracy is basically the average of all the accuracy to that point. When you reloaded the graph, this running average was reset. So good news, your network is training great.
Upvotes: 2