Tensorflow evaluate gives larger error than last epoch of training

Question

I have a TensorFlow regression model. I don't think the details of the model's layers are related to the question, so I'm skipping that. I can add that if you think it would be useful.

I compile with the following code. Loss and metric are mean squared error.

model.compile(
    loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(lr=0.001),
    metrics=['mse']
)

Now, I run the following code to train the network and evaluate it. I train it for 2 epochs, then I evaluate the model on the same data with evaluate method and I evaluate it by hand using predict method and MSE formula.

print('fit')
model.fit(X, y, epochs=2, batch_size=32)

print('evaluate')
print(model.evaluate(X, y))

print('manual evaluate')
print(((y - model.predict(X).ravel()) ** 2).mean())

Here is the result:

3152/3152 [==============================] - 12s 3ms/step - loss: 7.7276 - mse: 7.7275
Epoch 2/2
3152/3152 [==============================] - 11s 4ms/step - loss: 0.9898 - mse: 0.9894
evaluate
3152/3152 [==============================] - 2s 686us/step - loss: 1.3753 - mse: 1.3748
[1.3753225803375244, 1.3747814893722534]
manual evaluate
1.3747820755885116

I have a slight regularization so loss is a bit greater than mse as expected.

But, As you can see, MSE is 0.98 at the end of the last epoch. However, I get 1.37 MSE when I evaluate it by evaluate method or when I actually calculate it manually. The model uses the weights after the last epoch as far as I know so those two numbers should be equal, right? What I'm missing here? I tried with different batch_size and epoch counts. Evaluated MSE is always higher than MSE at the last epoch of the fit method.

Note: y is a one-dimensional NumPy array

y.shape
> (100836,)

Edit: I run the fit method with validation_data parameter using the same (X, y) as the validation data:

model.fit(X, y, epochs=2, batch_size=32, validation_data=(X, y))

Output:

Epoch 1/2
3152/3152 [==============================] - 23s 7ms/step - loss: 7.9766 - mse: 7.9764 - val_loss: 2.0284 - val_mse: 2.0280
Epoch 2/2
3152/3152 [==============================] - 22s 7ms/step - loss: 0.9839 - mse: 0.9836 - val_loss: 1.3436 - val_mse: 1.3431
evaluate
[1.3436073064804077, 1.3430677652359009]

Now, it makes some sense. The val_mse of the last epoch seems to match with evaluate result. But, I was expecting mse and val_mse values to be the same in the progress bar since training data and validation data are the same. I think my understanding of what the progress bar shows is not correct. Can someone explain how I should interpret the progress bar and why mse and val_mse values on the progress bar are different?

Ivan K. · Accepted Answer

The reason why, for the same data, the metrics (loss, in your case) are different during training and validation steps is simple. Namely, during training your model trains by changing its parameters from batch to batch. In the progress bar you see the mean of the metric for all batches. On the contrary, during validation step the parameters of your network are freezed. The used parameters are that obtained after processing the last batch the network has seen. This explains the difference.

The question why validation loss turned out to be bigger training loss is subtle. One reason might be that your model has layers that behave differently during training and validation (for example, BatchNorm, as noticed by Frightera). The other reason might be improper learning rate. If it is too big, the parameters will be changing too much, thus skipping the real minimum. Even with adam optimization this might be the case.

To understand if the problem is with the learning rate, try making it much smaller. Is the difference in the metric persists, then your network have layers behaving differently during training and validation phases.

There might be other reasons for the difference in the metrics. For example, the training data is noisy, so that the network cannot train well. This will cause the loss to fluctuate near the mean, which is normal. To understand whether this is the case, you should study the plots for the loss for different batches (for example, using TensorBoard).

Tensorflow evaluate gives larger error than last epoch of training

Answers (1)

Related Questions