Reputation: 1403
I have been seeing code that uses an Adam optimizer . And the way they decrease the learning rate is as follows:
optimizer = torch.optim.Adam(net.parameters(),lr=0.01)
(training...
optimizer.step()...)
if iteration >= some_threshold:
for param_group in optimizer.param_groups:
param_group['lr'] = 0.001
I thought we have the same learning rate for all parameters. So why then iterate over the param_groups and individually set learning rate for each parameter?
Wouldn't the following be faster and have an identical effect?
optimizer = torch.optim.Adam(net.parameters(),lr=0.01)
scheduler = MultiStepLR(optimizer, milestones=[some_threshold], gamma=0.1)
(training...
optimizer.step()
scheduler.step())
Thank you
Upvotes: 5
Views: 3092
Reputation: 33010
You need to iterate over param_groups
because if you don't specify multiple groups of parameters in the optimiser, you automatically have a single group. That doesn't mean you set the learning rate for each parameter, but rather each parameter group.
In fact the learning rate schedulers from PyTorch do the same thing. From _LRScheduler
(base class of learning rate schedulers):
with _enable_get_lr_call(self):
if epoch is None:
self.last_epoch += 1
values = self.get_lr()
else:
warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
self.last_epoch = epoch
if hasattr(self, "_get_closed_form_lr"):
values = self._get_closed_form_lr()
else:
values = self.get_lr()
for param_group, lr in zip(self.optimizer.param_groups, values):
param_group['lr'] = lr
Yes, it has the identical effect in this case, but it wouldn't be faster.
Upvotes: 6