rbaleksandar
rbaleksandar

Reputation: 9701

How to fix "initial_lr not specified when resuming optimizer" error for scheduler?

In PyTorch I have configured SGD like this:

sgd_config = {
    'params' : net.parameters(),
    'lr' : 1e-7,
    'weight_decay' : 5e-4,
    'momentum' : 0.9
}
optimizer = SGD(**sgd_config)

My requirements are:

So for 100 epochs I will get two times a decrease of 0.1 of my learning rate.

I read about learning rate scheduler, available in torch.optim.lr_scheduler so I decided to try using that instead of manually adjusting the learning rate:

scheduler = lr_scheduler.StepLR(optimizer, step_size=30, last_epoch=60, gamma=0.1)

However I am getting

Traceback (most recent call last):
  File "D:\Projects\network\network_full.py", line 370, in <module>
    scheduler = lr_scheduler.StepLR(optimizer, step_size=30, last_epoch=90, gamma=0.1)
  File "D:\env\test\lib\site-packages\torch\optim\lr_scheduler.py", line 367, in __init__
    super(StepLR, self).__init__(optimizer, last_epoch, verbose)
  File "D:\env\test\lib\site-packages\torch\optim\lr_scheduler.py", line 39, in __init__
    raise KeyError("param 'initial_lr' is not specified "
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

I read a post here and I still don't get how I would use the scheduler for my scenario. Maybe I am just not understanding the definition of last_epoch given that the documentation is very brief on this parameter:

last_epoch (int) – The index of last epoch. Default: -1.

Since the argument is made available to the user and there is no explicit prohibition on using a scheduler for less epochs than the optimizer itself, I am starting to think it's a bug.

Upvotes: 3

Views: 7670

Answers (4)

karthikeyanc2
karthikeyanc2

Reputation: 71

last_epoch must be -1 unless you are resuming the training. If you are attempting to resume the training, the problem is you created scheduler before loading the optimizer params.

Correct procedure to resume:

    sgd_config = {
        'params' : net.parameters(),
        'lr' : 1e-7,
        'weight_decay' : 5e-4,
        'momentum' : 0.9
    }
    optimizer = SGD(**sgd_config)
    optimizer.load_state_dict(torch.load('your_save_optimizer_params.pt'))
    scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1, last_epoch=50)

Upvotes: 7

hao zhang
hao zhang

Reputation: 1

 sgd_config = {
    'params' : net.parameters(),
    **'initial_lr': 1e-7**,
    'lr' : 1e-7,
    'weight_decay' : 5e-4,
    'momentum' : 0.9
}
optimizer = SGD(**sgd_config)

Upvotes: 0

russian_spy
russian_spy

Reputation: 6655

A: You have to specify last_epoch=60 as a separate command, in diff format:

<< scheduler = lr_scheduler.StepLR(optimizer, step_size=30, last_epoch=60, gamma=0.1)

>> scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
>> scheduler.last_epoch = 60

Use this to inspect scheduler values:

print(scheduler.state_dict())

{'step_size': 30, 'gamma': 0.1, 'base_lrs': [0.0002], 'last_epoch': 4, '_step_count': 5, 'verbose': False, '_get_lr_called_within_step': False, '_last_lr': [0.0002]}

Upvotes: 0

ShinyDemon
ShinyDemon

Reputation: 43

You have misunderstood the last_epoch argument and you are not using the correct learning rate scheduler for your requirements.

This should work:

optim.lr_scheduler.MultiStepLR(optimizer, [0, 30, 60], gamma=0.1, last_epoch=args.current_epoch - 1)

The last_epoch argument makes sure to use the correct LR when resuming training. It defaults to -1, so the epoch before epoch 0.

Upvotes: 1

Related Questions