Mohit Lamba
Mohit Lamba

Reputation: 1403

Preferred way to decrease learning rate for Adam optimiser in PyTorch

I have been seeing code that uses an Adam optimizer . And the way they decrease the learning rate is as follows:


    optimizer = torch.optim.Adam(net.parameters(),lr=0.01)
    (training...
    optimizer.step()...)

    if iteration >= some_threshold:
        for param_group in optimizer.param_groups:
            param_group['lr'] = 0.001

I thought we have the same learning rate for all parameters. So why then iterate over the param_groups and individually set learning rate for each parameter?

Wouldn't the following be faster and have an identical effect?


    optimizer = torch.optim.Adam(net.parameters(),lr=0.01)
    scheduler = MultiStepLR(optimizer, milestones=[some_threshold], gamma=0.1)
    (training...
    optimizer.step()
    scheduler.step())

Thank you

Upvotes: 5

Views: 3092

Answers (1)

Michael Jungo
Michael Jungo

Reputation: 33010

You need to iterate over param_groups because if you don't specify multiple groups of parameters in the optimiser, you automatically have a single group. That doesn't mean you set the learning rate for each parameter, but rather each parameter group.

In fact the learning rate schedulers from PyTorch do the same thing. From _LRScheduler (base class of learning rate schedulers):

with _enable_get_lr_call(self):
    if epoch is None:
        self.last_epoch += 1
        values = self.get_lr()
    else:
        warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
        self.last_epoch = epoch
        if hasattr(self, "_get_closed_form_lr"):
            values = self._get_closed_form_lr()
        else:
            values = self.get_lr()

for param_group, lr in zip(self.optimizer.param_groups, values):
    param_group['lr'] = lr

Yes, it has the identical effect in this case, but it wouldn't be faster.

Upvotes: 6

Related Questions