Alexander Riedel
Alexander Riedel

Reputation: 1359

PyTorch using LR-Scheduler with param groups of different LR's

I have defined the following optimizer with different learning rates for each parameter group:

  optimizer = optim.SGD([
          {'params': param_groups[0], 'lr': CFG.lr, 'weight_decay': CFG.weight_decay},
          {'params': param_groups[1], 'lr': 2*CFG.lr, 'weight_decay': 0},
          {'params': param_groups[2], 'lr': 10*CFG.lr, 'weight_decay': CFG.weight_decay},
          {'params': param_groups[3], 'lr': 20*CFG.lr, 'weight_decay': 0},
      ], lr=CFG.lr, momentum=0.9, weight_decay=CFG.weight_decay, nesterov=CFG.nesterov)

Now I want to use a LR-Scheduler to update all the learning rates and not only the first one, because by deafult, a scheduler would only update the param_groups[0]?

scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=5, T_mult=2, eta_min=CFG.min_lr, last_epoch=-1, verbose=True)

Giving me:

Parameter Group 0
    dampening: 0
    initial_lr: 0.001
    lr: 0.0009999603905218616
    momentum: 0.9
    nesterov: True
    weight_decay: 0.0001

Parameter Group 1
    dampening: 0
    initial_lr: 0.002
    lr: 0.002
    momentum: 0.9
    nesterov: True
    weight_decay: 0

Parameter Group 2
    dampening: 0
    initial_lr: 0.01
    lr: 0.01
    momentum: 0.9
    nesterov: True
    weight_decay: 0.0001

Parameter Group 3
    dampening: 0
    initial_lr: 0.02
    lr: 0.02
    momentum: 0.9
    nesterov: True
    weight_decay: 0
)

after one update.

Any idea how to update all the learning rates with a scheduler?

Upvotes: 3

Views: 3602

Answers (1)

yutasrobot
yutasrobot

Reputation: 2496

You are right, learning rate scheduler should update each group's learning rate one by one. After a bit of testing, it looks like, this problem only occurs with CosineAnnealingWarmRestarts scheduler. I've tested CosineAnnealingLR and couple of other schedulers, they updated each group's learning rate:

 scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 100, verbose=True)

Then, to find the cause of the problem I had a look at the source code of learning rate scheduler: https://github.com/pytorch/pytorch/blob/master/torch/optim/lr_scheduler.py

From a quick look through, it looks like there are some differences between the implementation of CosineAnnealingLR and CosineAnnealingWarmRestarts get_lr() functions:

 # CosineAnnealingLR:
 def get_lr(self):
     if not self._get_lr_called_within_step:
         warnings.warn("To get the last learning rate computed by the scheduler, "
                       "please use `get_last_lr()`.", UserWarning)
 
     if self.last_epoch == 0:
         return self.base_lrs
     elif (self.last_epoch - 1 - self.T_max) % (2 * self.T_max) == 0:
         return [group['lr'] + (base_lr - self.eta_min) *
                 (1 - math.cos(math.pi / self.T_max)) / 2
                 for base_lr, group in
                 zip(self.base_lrs, self.optimizer.param_groups)]
     return [(1 + math.cos(math.pi * self.last_epoch / self.T_max)) /
             (1 + math.cos(math.pi * (self.last_epoch - 1) / self.T_max)) *
             (group['lr'] - self.eta_min) + self.eta_min
             for group in self.optimizer.param_groups]    
 
 # CosineAnnealingWarmRestarts:
 def get_lr(self):
     if not self._get_lr_called_within_step:
         warnings.warn("To get the last learning rate computed by the scheduler, "
                       "please use `get_last_lr()`.", UserWarning)
 
     return [self.eta_min + (base_lr - self.eta_min) * (1 + math.cos(math.pi * self.T_cur / self.T_i)) / 2
             for base_lr in self.base_lrs]

So after looking through the code, I feel like this issue is a bug. Even the documentation of CosineAnnealingWarmRestart suggests that "Set the learning rate of each parameter group using a cosine annealing schedule".

Upvotes: 3

Related Questions