Reputation: 471
I am passing the same input i.e same data and same true labels to keras train_on_batch() and test_on_batch(). I want to know why I am getting different loss values for both functions.
code:
model_del_fin.compile(optimizer=SGD(lr=0.001,decay=0.001/15), loss='categorical_crossentropy',metrics=['accuracy'])
iters_per_epoch = 1285 // 50
print(iters_per_epoch)
num_epochs = 15
outs_store_freq = 20 # in iters
print_loss_freq = 20 # in iters
iter_num = 0
epoch_num = 0
model_outputs = []
loss_history = []
while epoch_num < num_epochs:
print("ok")
while iter_num < iters_per_epoch:
x_train, y_train = next(train_it2)
loss_history += [model_del_fin.train_on_batch([x_train,x_train], y_train)]
print("Iter {} loss: {}".format(iter_num, loss_history[-1]))
print(model_del_fin.test_on_batch([x_train,x_train], y_train))
iter_num += 1
print("EPOCH {} FINISHED".format(epoch_num + 1))
epoch_num += 1
iter_num = 0 # reset counter
**Result: **
Iter 0 loss: [5.860205, 0.24] [2.5426426, 0.68] Iter 1 loss: [3.5718067, 0.48] [1.7102847, 0.68] Iter 2 loss: [2.0221999, 0.68] [1.310905, 0.94] Iter 3 loss: [1.6114614, 0.74] [1.2987132, 0.92]
Upvotes: 4
Views: 1101
Reputation: 19836
The problem stems from feeding data to a model in train mode vs. inference mode; differences include:
Dropout
is active, BatchNormalization
uses batch statistics for mean & varianceDropout
rates are set to zero, and BatchNormalization
uses exponential moving average statistics for mean & variance, computed during trainingOther diffferences also apply. Assuming models used in your code are based on those in other question(s) (i.e. VGG) - it'll be the BN layers. As a workaround, you can temporarily set a global learning phase via:
K.set_learning_phase(0) # INFERENCE MODE
K.set_learning_phase(1) # TRAIN MODE
Note, however, that either of these must be executed before the model is instantiated, else changes will not apply. Also, this may not solve the problem entirely, as BN is known to have other issues (which I happen to be currently investigating) - but results should agree a lot closer nonetheless.
Lastly, if you first call train_on_batch()
and then call test_on_batch()
, the two will disagree because test
executes after train
has updated weights - thus, call test_on_batch()
first, then train_on_batch()
.
Full example:
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.models import Model
import tensorflow.keras.backend as K
import numpy as np
K.set_learning_phase(0)
ipt = Input((12,))
x = Dropout(0.1)(ipt)
out = Dense(12)(x)
model = Model(ipt, out)
model.compile('adam', 'mse')
X = np.random.randn(32, 12)
print(model.train_on_batch(X, X))
print(model.test_on_batch(X, X))
print(model.train_on_batch(X, X))
2.1212778 # train_on_batch()
2.1128066 # test_on_batch()
2.1128066 # train_on_batch()
Try using K.set_learning_phase(1)
instead to see the difference - or comment it out entirely, as 1
is the default value anyway.
Upvotes: 6