Why does loss skyrocket during convolutional neural net training?

Question

I am training a simple CNN in Pytorch for segmentation on a very small dataset (just a few images as this is just for proof of concept purposes). For some reason, loss skyrockets to as high as 6 and IoU drops to 0 (intersection over union accuracy metric) randomly during training before going back up. I was wondering why this could be happening?

snowflake · Accepted Answer

Instability. It is actually common. Take a look at the published papers and you will see the same thing too. During gradient descent, there may be "rough patches" in the gradient landscape and gives a locally bad solution, hence high loss.

Having said that, some of these spikes can actually signify that you have made poor hyperparameter and network architecture choices. From my experience, one of the possible causes for the spikes is the usage of weight decay. Weight decay provides regularisation, but in my own work, i found it to cause a lot of instability. So nowadays, i don't use it anymore.

The spikes in your graph doesn't look too bad, i wouldn't worry about it.

Why does loss skyrocket during convolutional neural net training?

Answers (1)

Related Questions