Reputation: 1359
I have defined the following optimizer with different learning rates for each parameter group:
optimizer = optim.SGD([
{'params': param_groups[0], 'lr': CFG.lr, 'weight_decay': CFG.weight_decay},
{'params': param_groups[1], 'lr': 2*CFG.lr, 'weight_decay': 0},
{'params': param_groups[2], 'lr': 10*CFG.lr, 'weight_decay': CFG.weight_decay},
{'params': param_groups[3], 'lr': 20*CFG.lr, 'weight_decay': 0},
], lr=CFG.lr, momentum=0.9, weight_decay=CFG.weight_decay, nesterov=CFG.nesterov)
Now I want to use a LR-Scheduler to update all the learning rates and not only the first one, because by deafult, a scheduler would only update the param_groups[0]?
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=5, T_mult=2, eta_min=CFG.min_lr, last_epoch=-1, verbose=True)
Giving me:
Parameter Group 0
dampening: 0
initial_lr: 0.001
lr: 0.0009999603905218616
momentum: 0.9
nesterov: True
weight_decay: 0.0001
Parameter Group 1
dampening: 0
initial_lr: 0.002
lr: 0.002
momentum: 0.9
nesterov: True
weight_decay: 0
Parameter Group 2
dampening: 0
initial_lr: 0.01
lr: 0.01
momentum: 0.9
nesterov: True
weight_decay: 0.0001
Parameter Group 3
dampening: 0
initial_lr: 0.02
lr: 0.02
momentum: 0.9
nesterov: True
weight_decay: 0
)
after one update.
Any idea how to update all the learning rates with a scheduler?
Upvotes: 3
Views: 3602
Reputation: 2496
You are right, learning rate scheduler should update each group's learning rate one by one. After a bit of testing, it looks like, this problem only occurs with CosineAnnealingWarmRestarts scheduler. I've tested CosineAnnealingLR and couple of other schedulers, they updated each group's learning rate:
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 100, verbose=True)
Then, to find the cause of the problem I had a look at the source code of learning rate scheduler: https://github.com/pytorch/pytorch/blob/master/torch/optim/lr_scheduler.py
From a quick look through, it looks like there are some differences between the implementation of CosineAnnealingLR and CosineAnnealingWarmRestarts get_lr() functions:
# CosineAnnealingLR:
def get_lr(self):
if not self._get_lr_called_within_step:
warnings.warn("To get the last learning rate computed by the scheduler, "
"please use `get_last_lr()`.", UserWarning)
if self.last_epoch == 0:
return self.base_lrs
elif (self.last_epoch - 1 - self.T_max) % (2 * self.T_max) == 0:
return [group['lr'] + (base_lr - self.eta_min) *
(1 - math.cos(math.pi / self.T_max)) / 2
for base_lr, group in
zip(self.base_lrs, self.optimizer.param_groups)]
return [(1 + math.cos(math.pi * self.last_epoch / self.T_max)) /
(1 + math.cos(math.pi * (self.last_epoch - 1) / self.T_max)) *
(group['lr'] - self.eta_min) + self.eta_min
for group in self.optimizer.param_groups]
# CosineAnnealingWarmRestarts:
def get_lr(self):
if not self._get_lr_called_within_step:
warnings.warn("To get the last learning rate computed by the scheduler, "
"please use `get_last_lr()`.", UserWarning)
return [self.eta_min + (base_lr - self.eta_min) * (1 + math.cos(math.pi * self.T_cur / self.T_i)) / 2
for base_lr in self.base_lrs]
So after looking through the code, I feel like this issue is a bug. Even the documentation of CosineAnnealingWarmRestart suggests that "Set the learning rate of each parameter group using a cosine annealing schedule".
Upvotes: 3