Why would a neural networks validation loss and accuracy fluctuate at first?

Question

I am training a neural network and at the beginning of training my networks loss and accuracy on the validation data fluctuates a lot, but towards the end of training it stabilizes. I am reduce learning rate on plateau for this network. Could it be that the network starts with a high learning rate and as the learning rate decreases both accuracy and loss stabilize?

jmsinusa · Accepted Answer

For SGD, the amount of change in the parameters is a multiple of the learning rate and the gradient of the parameter values with respect to the loss.

θ = θ − α ∇_θ E[J(θ)]

Every step it takes will be in a sub-optimal direction (ie slightly wrong) as the optimiser has usually only seen some of the values. At the start of training you are relatively from the optimal solution, so the gradient ∇_θ E[J(θ)] is large, therefore each sub-optimal step has a large effect on your loss and accuracy.

Over time, as you (hopefully) get closer to the optimal solution, the gradient is smaller, so the steps become smaller, meaning that the effects of being slightly wrong are diminished. Smaller errors on each step makes your loss decrease more smoothly, so reduces fluctuations.

Why would a neural networks validation loss and accuracy fluctuate at first?

Answers (1)

Related Questions