Reputation: 31
I am using Keras with TensorFlow to implement a deep neural network. When I plot the loss and number of iterations, there is a significant jump in loss after each epoch. In reality, the loss of each mini-batch should vary from each other, but Keras calculates the moving average of the loss over the mini-batches, that's why we obtain a smooth curve instead of an arbitrary one. The array of the moving average is reset after each epoch because of which we can observe a jump in the loss.
I would like to remove the functionality of moving average instead I would like to have raw loss values which will vary for each mini-batch. For now, I tried reduction in the loss function but it works only on the examples within the mini-batch. The following code sum losses of all the training examples within the mini-batch.
tf.keras.losses.BinaryCrossentropy(reduction = 'sum')
I also tried writing a custom loss function but that doesn't help either.
Upvotes: 1
Views: 1454
Reputation: 33
Understanding Noisy Loss Curves in Keras (TensorFlow 2.13)
TLDR: This is a smoothing artifact dues to Keras way of using running means for all its losses and metrics.
This answer is the same as in Keras loss value significant jump
Why the jumpy Loss Curves?
It took me quite some time to understand why there were jumps between epochs during training, and I noticed many others discussing similar issues on various forums. I searched for bugs in my own training pipeline and tried to understand the internals. In the end this is just a case of bad documentation and a smoothing artifact.
The reason for these jumps is a bit subtle. Keras, a popular AI framework, calculates its metrics and losses as running means over one epoch. This can make your loss curve look quite noisy at the beginning of training. However, over time, it smoothens out. But here's the catch: in the next epoch, the loss is actually much lower than the previous epoch's average. This creates that staircase-like curve in your training plot.
If you want to visualize the raw batch loss, you can use a custom callback. I normally just inherit the Keras TensorBoard callback and add the stuff I need but you can ofc just create you own custom independent callback as done here Keras loss value significant jump
import tensorflow as tf
class CustomKerasTensorBoard(tf.keras.callbacks.TensorBoard):
def on_epoch_begin(self, epoch, logs=None):
self.previous_loss_sum = 0
super().on_epoch_begin(epoch, logs)
def on_train_batch_end(self, batch, logs=None):
current_loss_sum = (batch + 1) * logs["loss"]
current_loss = current_loss_sum - self.previous_loss_sum
self.previous_loss_sum = current_loss_sum
logs["loss_raw"] = current_loss
super().on_train_batch_end(batch, logs)
And add it to your model.fit/model.evaluate/model.predict call.
Here's a graphical representation to help you understand the concept better:
The top graph shows the classification accuracy. The middle graph displays the loss. The bottom one illustrates the raw batch loss, which I've left unsmoothed. So, when you see those seemingly erratic jumps in your loss curve, remember it's part of the smoothing process, and your training might be progressing just fine. Keep calm and carry on!
Upvotes: 2
Reputation: 766
I would like to remove the functionality of moving average instead I would like to have raw loss values that will vary for each mini-batch.
That can reach by using callback functions but again I look through the question you also try to optimize the actual loss value back into the calculation.
That is, of course, you can apply in the callback function or you can do it directly since this example tells you how the basic custom optimizer works.
[ Sample ]:
import os
from os.path import exists
import tensorflow as tf
import matplotlib.pyplot as plt
from skimage.transform import resize
import numpy as np
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
learning_rate = 0.001
global_step = 0
tf.compat.v1.disable_eager_execution()
BATCH_SIZE = 1
IMG_SIZE = (32, 32)
history = [ ]
history_Y = [ ]
list_file = [ ]
list_label = [ ]
for file in os.listdir("F:\\datasets\\downloads\\dark\\train") :
image = plt.imread( "F:\\datasets\\downloads\\dark\\train\\" + file )
image = resize(image, (32, 32))
image = np.reshape( image, (1, 32, 32, 3) )
list_file.append( image )
list_label.append(1)
optimizer = tf.compat.v1.train.ProximalAdagradOptimizer(
learning_rate,
initial_accumulator_value=0.1,
l1_regularization_strength=0.2,
l2_regularization_strength=0.1,
use_locking=False,
name='ProximalAdagrad'
)
var1 = tf.Variable(255.0)
var2 = tf.Variable(10.0)
X_var = tf.compat.v1.get_variable('X', dtype = tf.float32, initializer = tf.random.normal((1, 32, 32, 3)))
y_var = tf.compat.v1.get_variable('Y', dtype = tf.float32, initializer = tf.random.normal((1, 32, 32, 3)))
Z = tf.nn.l2_loss((var1 - X_var) ** 2 + (var2 - y_var) ** 2, name="loss")
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
loss = tf.reduce_mean(input_tensor=tf.square(Z))
training_op = optimizer.minimize(cosine_loss(X_var, y_var))
previous_train_loss = 0
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
image = list_file[0]
X = image
Y = image
for i in range(1000):
global_step = global_step + 1
train_loss, temp = sess.run([loss, training_op], feed_dict={X_var:X, y_var:Y})
history.append( train_loss )
if global_step % 2 == 0 :
var2 = var2 - 0.001
if global_step % 4 == 0 and train_loss <= previous_train_loss :
var1 = var1 - var2 + 0.5
print( 'steps: ' + str(i) )
print( 'train_loss: ' + str(train_loss) )
previous_train_loss = train_loss
sess.close()
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Graph
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
history = history[:-1]
plt.plot(np.asarray(history))
plt.xlabel('Epoch')
plt.ylabel('loss')
plt.legend(loc='lower right')
plt.show()
Upvotes: -1
Reputation: 31
Keras in fact shows the moving average instead of the "raw" loss values. In order to acquire the raw loss values, one should implement a callback as shown below:
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
#initialize a list at the begining of training
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
mycallback = LossHistory()
Then call it in model.fit
model.fit(X, Y, epochs=epochs, batch_size=batch, shuffle=True, verbose = 0, callbacks=[mycallback])
print(mycallback.losses)
I tested with the following configuration
Keras 2.3.1
Tensorflow 2.1.0
Python 3.7.9
Upvotes: 0