Poorer performance when change optimizer from Adam to Nesterov

Question

I am running an image segmentation code on Pytorch, based on the architecture of Linknet. The optimizer is initially set as:

self.optimizer = torch.optim.Adam(params=self.net.parameters(), lr=lr)

Then I change it to Nesterov to improve the performance, like:

self.optimizer = torch.optim.SGD(params=self.net.parameters(), lr=lr, momentum=0.9, nesterov=True)

However, the performance is poorer using Nesterov. When I use Adam the loss function can converge to 0.19. But the loss function can only converge to 0.34 when I use Nesterov.

By the way, the learning rate is divided by 5 if no decrease of loss in 3 consecutive epochs, and lr can adjust 3 times. After that, the training process finish.

I am wondering why this happens and what should I do for optimization? Thanks a lot for the replys:)

Poorer performance when change optimizer from Adam to Nesterov

Answers (1)

Related Questions