Reputation: 215
I've been training the exactly same model (with the exactly same training dataset) twice but results are very different, and I got confused about the behavior of their loss curves.
The loss curve of the 1st experiment (red curve) suddenly jump up near the end of the first epoch, and then slowly steadily decrease.
However, the loss curve of 2nd experiments (blue curve) didn't jump up anywhere, and always steadily decrease to converge. The loss after 20 epoch is much lower than the 1st experiment, and I got very good quality output.
I don't know what cause that big jump at the first time. Both experiments used same models, and the training dataset.
Description of the model: My project is sparse-view CT image reconstruction. My goal is to reconstruct the sparse-view image by using the iterative method + CNN inside of each iteration. This is very similar to the LEARN algorithm proposed by Chen.
The process contains 30 number of iterations, and at each iteration, I use CNN to better train the regularization term.
Since I have 30 iterations, and 3+ (I've been trying different complexity of architectures) layers of CNN in each of the iterations, I understand there will be large number of parameters and layers.
So far, for all the CNN architectures I've been testing, the "big jump" happened quite usual at each of them.
The training data consists of 3600 512*512 sparse-view CT images, and the test data consists of 360 sparse-view CT images.
The batch size is 1, and epoch = 20.
UPDATE: Thank you all for the suggestions. After reading the answers, I started thinking about gradient exploding/vanishing issues. So I changed ReLU to ELU, and change the weight initialization from Xavier to He, and added gradient clipping. The result turns out great. I run the standard model(the same model as I mentioned above) five more times, and they are all steadily slowing down. For the other models with CNN arch, their loss also decrease and no major strikes happened.
The code already has the training dataset shuffled at the beginning of every epoch. what I'm planning to do next is adding batch normalization, and try max_norm regularization.
Upvotes: 6
Views: 6416
Reputation: 4691
This is going to a similar answer to @Anant but put in a different manner. I usually prefer a backtracking approach for me to get an intuition.
In the case of deep neural networks, this can occur due to the exploding/vanishing gradient. You may want to do either do weight clipping or adjust weight initialization such that weights are closer to 1 so that the chances of explosion reduces.
Also, if your learning rate is big, then such a problem can occur. In such case, you can either lower down the learning rate or use learning rate decay as well.
Upvotes: 3
Reputation: 2113
The loss value can suddenly jump only if there is an extreme update to the parameters which essentially happens when you take a big gradient step - this problem is generally called gradient exploding.
To discuss the potential reasons for this explosion, it could probably because of a nasty combination of random initialisation of weights, learning rate, and also probably the batch of training-data which was passed during the iteration.
Without knowing the exact details of the model, I can only suggest a general solution - you should try smaller learning rate and probably shuffle your training data well. Hope this somewhat helps.
Upvotes: 1