Preferred way to decrease learning rate for Adam optimiser in PyTorch

Question

I have been seeing code that uses an Adam optimizer . And the way they decrease the learning rate is as follows:


    optimizer = torch.optim.Adam(net.parameters(),lr=0.01)
    (training...
    optimizer.step()...)

    if iteration >= some_threshold:
        for param_group in optimizer.param_groups:
            param_group['lr'] = 0.001

I thought we have the same learning rate for all parameters. So why then iterate over the param_groups and individually set learning rate for each parameter?

Wouldn't the following be faster and have an identical effect?


    optimizer = torch.optim.Adam(net.parameters(),lr=0.01)
    scheduler = MultiStepLR(optimizer, milestones=[some_threshold], gamma=0.1)
    (training...
    optimizer.step()
    scheduler.step())

Thank you

Michael Jungo · Accepted Answer

You need to iterate over param_groups because if you don't specify multiple groups of parameters in the optimiser, you automatically have a single group. That doesn't mean you set the learning rate for each parameter, but rather each parameter group.

In fact the learning rate schedulers from PyTorch do the same thing. From _LRScheduler (base class of learning rate schedulers):

with _enable_get_lr_call(self):
    if epoch is None:
        self.last_epoch += 1
        values = self.get_lr()
    else:
        warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
        self.last_epoch = epoch
        if hasattr(self, "_get_closed_form_lr"):
            values = self._get_closed_form_lr()
        else:
            values = self.get_lr()

for param_group, lr in zip(self.optimizer.param_groups, values):
    param_group['lr'] = lr

Yes, it has the identical effect in this case, but it wouldn't be faster.

Preferred way to decrease learning rate for Adam optimiser in PyTorch

Answers (1)

Related Questions