Keras: Why is loss different for train_on_batch() and test_on_batch() when same input is being passed to both functions?

Question

I am passing the same input i.e same data and same true labels to keras train_on_batch() and test_on_batch(). I want to know why I am getting different loss values for both functions.

code:

model_del_fin.compile(optimizer=SGD(lr=0.001,decay=0.001/15), loss='categorical_crossentropy',metrics=['accuracy'])
iters_per_epoch = 1285 // 50
print(iters_per_epoch)
num_epochs = 15
outs_store_freq = 20 # in iters
print_loss_freq = 20 # in iters

iter_num = 0
epoch_num = 0
model_outputs = []
loss_history  = []

while epoch_num < num_epochs:
  print("ok")
  while iter_num < iters_per_epoch:
    x_train, y_train = next(train_it2)
    loss_history += [model_del_fin.train_on_batch([x_train,x_train], y_train)]
    print("Iter {} loss: {}".format(iter_num, loss_history[-1]))
    print(model_del_fin.test_on_batch([x_train,x_train], y_train))
    iter_num += 1
  print("EPOCH {} FINISHED".format(epoch_num + 1))

  epoch_num += 1
  iter_num = 0 # reset counter

**Result: **

Iter 0 loss: [5.860205, 0.24] [2.5426426, 0.68] Iter 1 loss: [3.5718067, 0.48] [1.7102847, 0.68] Iter 2 loss: [2.0221999, 0.68] [1.310905, 0.94] Iter 3 loss: [1.6114614, 0.74] [1.2987132, 0.92]

OverLordGoldDragon · Accepted Answer

The problem stems from feeding data to a model in train mode vs. inference mode; differences include:

Train mode: Dropout is active, BatchNormalization uses batch statistics for mean & variance
Inference: all Dropout rates are set to zero, and BatchNormalization uses exponential moving average statistics for mean & variance, computed during training

Other diffferences also apply. Assuming models used in your code are based on those in other question(s) (i.e. VGG) - it'll be the BN layers. As a workaround, you can temporarily set a global learning phase via:

K.set_learning_phase(0) # INFERENCE MODE
K.set_learning_phase(1) # TRAIN MODE

Note, however, that either of these must be executed before the model is instantiated, else changes will not apply. Also, this may not solve the problem entirely, as BN is known to have other issues (which I happen to be currently investigating) - but results should agree a lot closer nonetheless.

Lastly, if you first call train_on_batch() and then call test_on_batch(), the two will disagree because test executes after train has updated weights - thus, call test_on_batch() first, then train_on_batch().

Full example:

from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.models import Model
import tensorflow.keras.backend as K
import numpy as np

K.set_learning_phase(0)

ipt = Input((12,))
x   = Dropout(0.1)(ipt)
out = Dense(12)(x)

model = Model(ipt, out)
model.compile('adam', 'mse')
X = np.random.randn(32, 12)

print(model.train_on_batch(X, X))
print(model.test_on_batch(X, X))
print(model.train_on_batch(X, X))

2.1212778 # train_on_batch()
2.1128066 # test_on_batch()
2.1128066 # train_on_batch()

Try using K.set_learning_phase(1) instead to see the difference - or comment it out entirely, as 1 is the default value anyway.

Keras: Why is loss different for train_on_batch() and test_on_batch() when same input is being passed to both functions?

Answers (1)

Related Questions