Reporting accuracy and loss issues with MonitoredTrainingSession

Question

I am performing transfer learning on InceptionV3 for a dataset of 5 types of flowers. All layers are frozen except the output layer. My implementation is heavily based off of the Cifar10 tutorial from Tensorflow and the input dataset is formated in the same way as Cifar10.

I have added a MonitoredTrainingSession (like in the tutorial) to report the accuracy and loss after a certain number of steps. Below is the section of the code for the MonitoredTrainingSession (almost identical to the tutorial):

class _LoggerHook(tf.train.SessionRunHook):

    def begin(self):
        self._step = -1
        self._start_time = time.time()
    def before_run(self,run_context):
        self._step+=1
        return tf.train.SessionRunArgs([loss,accuracy])

    def after_run(self,run_context,run_values):
        if self._step % LOG_FREQUENCY ==0:
            current_time = time.time()
            duration = current_time - self._start_time
            self._start_time = current_time

            loss_value = run_values.results[0]
            acc = run_values.results[1]

            examples_per_sec = LOG_FREQUENCY/duration
            sec_per_batch = duration / LOG_FREQUENCY

            format_str = ('%s: step %d, loss = %.2f, acc = %.2f (%.1f examples/sec; %.3f sec/batch)')

            print(format_str %(datetime.now(),self._step,loss_value,acc,
                examples_per_sec,sec_per_batch))
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
if MODE == 'train':

    file_writer = tf.summary.FileWriter(LOGDIR,tf.get_default_graph())
    with tf.train.MonitoredTrainingSession(
            save_checkpoint_secs=70,
            checkpoint_dir=LOGDIR,
            hooks=[tf.train.StopAtStepHook(last_step=NUM_EPOCHS*NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN),
                    tf.train.NanTensorHook(loss),
                    _LoggerHook()],
            config=config) as mon_sess:
        original_saver.restore(mon_sess,INCEPTION_V3_CHECKPOINT)
        print("Proceeding to training stage")

        while not mon_sess.should_stop():
            mon_sess.run(train_op,feed_dict={training:True})
            print('acc: %f' %mon_sess.run(accuracy,feed_dict={training:False}))
            print('loss: %f' %mon_sess.run(loss,feed_dict={training:False}))

When the two lines printing the accuracy and loss under mon_sess.run(train_op... are removed, the loss and accuracy printed from after_run, after it trains for surprisingly only 20 min, report that the model is performing very well on the training set and the loss is decreasing. Even the moving average loss was reporting great results. It eventually approaches greater than 90% accuracy for multiple random batches.

After, the training session was reporting high accuracy for a while,I stopped the training session, restored the model, and ran it on random batches from the same training set. It performed poorly, only achieving between 50% and 85% accuracy. I confirmed it was restored properly because it did perform better than a model with an untrained output layer.

I then went back to training again from the last checkpoint. The accuracy was initially low but after about 10 mini batch runs the accuracy went back above 90%. I then repeated the process but this time added the two lines for evaluating the loss and accuracy after the training operation. Those two evaluations reported that the model was having issues converging and performing poorly. While the evaluations via before_run and after_run, now only occasionally showed high accuracy and low loss (the results jumped around). But still after_run sometimes reported 100% accuracy (the fact that it is no longer consistent I think is because after_run is getting called also for mon_sess.run(accuracy...) and mon_sess.run(loss...)).

Why would the results reported from MonitoredTrainingSession be indicating the model is performing well when it really isn't? Aren't the two operations in SessionRunArgs being fed with the same mini batch as train_op, indicating model performance on the batch before gradient update?

Here is the code I used for restoring and testing the model(based of the cifar10 tutorial):

elif MODE == 'test':
    init = tf.global_variables_initializer()
    ckpt = tf.train.get_checkpoint_state(LOGDIR)
    if ckpt and ckpt.model_checkpoint_path:
        with tf.Session(config=config) as sess:
                init.run()
                saver = tf.train.Saver()
                print(ckpt.model_checkpoint_path)
                saver.restore(sess,ckpt.model_checkpoint_path)
                global_step = tf.contrib.framework.get_or_create_global_step()

                coord = tf.train.Coordinator()
                threads =[]
                try:
                    for qr in tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS):
                        threads.extend(qr.create_threads(sess, coord=coord, daemon=True,start=True))
                    print('model restored')
                    i =0
                    num_iter = 4*NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN/BATCH_SIZE
                    print(num_iter)
                    while not coord.should_stop() and i < num_iter:
                        print("loss: %.2f," %loss.eval(feed_dict={training:False}),end="")
                        print("acc: %.2f" %accuracy.eval(feed_dict={training:False}))
                        i+=1
                except Exception as e:
                    print(e)
                    coord.request_stop(e)
                coord.request_stop()
                coord.join(threads,stop_grace_period_secs=10)

Update :

So I was able to fix the issue. However, i am not sure why it worked. In the arg_scope for the inception model i was passing in an is_training Boolean placeholder for Batch Norm and dropout used by inception. However, when I removed the placeholder and just set the is_training keyword to true, the accuracy on the training set when the model was restored was extremely high. This was the same model checkpoint that previously performed poorly. When i trained it i always had the is_training placeholder set to true. Having the is_training set to true while testing would mean batch Norm is now using th sample mean and variance.

Why would telling Batch Norm to now use the sample average and sample standard deviation like it does during training increase the accuracy?

This would also mean that the dropout layer is dropping units and that the model's accuracy during testing on both the training set and test set is higher with the dropout layer enabled.

Update 2 I went through the tensorflow slim inceptionv3 model code that the arg_scope in the code above is referencing. I removed the final dropout layer after the Avg pool 8x8 and the accuracy remained at around 99%. However, when I set is_training to False only for the batch norm layers, the accuracy dropped back to around 70%. Here is the arg_scope from slim ets\inception_v3.py and my modification.

with variable_scope.variable_scope(
      scope, 'InceptionV3', [inputs, num_classes], reuse=reuse) as scope:
    with arg_scope(
        [layers_lib.batch_norm],is_training=False): #layers_lib.dropout], is_training=is_training):
      net, end_points = inception_v3_base(
          inputs,
          scope=scope,
          min_depth=min_depth,
          depth_multiplier=depth_multiplier)

I tried this with both the dropout layer removed and the dropout layer kept with passing in is_training=True to the dropout layer.

Allen Lavoie · Accepted Answer

(Summarizing from dylan7's debugging in the question's comments)

Batch norm relies on variables to save the summary statistics it normalizes with. These are only updated when is_training is True through an UPDATE_OPS collection (see the batch_norm documentation). If these update ops don't get run (or the variables are overwritten), there may be transient "reasonable" statistics based on each batch which get lost when is_training is False (testing data is not, and should not be, used to inform batch_norm summary statistics).

Reporting accuracy and loss issues with MonitoredTrainingSession

Answers (1)

Related Questions