Reputation: 145
I'm training an NN and using RMSprop as an optimizer and OneCycleLR as a scheduler. I've been running it like this (in slightly simplified code):
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.00001,
alpha=0.99, eps=1e-08, weight_decay=0.0001, momentum=0.0001, centered=False)
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.0005, epochs=epochs)
for epoch in range(epochs):
model.train()
for counter, (images, targets) in enumerate(train_loader):
# clear gradients from last run
optimizer.zero_grad()
# Run forward pass through the mini-batch
outputs = model(images)
# Calculate the losses
loss = loss_fn(outputs, targets)
# Calculate the gradients
loss.backward()
# Update parameters
optimizer.step() # Optimizer before scheduler????
scheduler.step()
# Check loss on training set
test()
Note the optimizer and scheduler calls in each mini-batch. This is working, though when I plot the learning rates through the training, the curve is very bumpy. I checked the docs again, and this is the example shown for torch.optim.lr_scheduler.OneCycleLR
>>> data_loader = torch.utils.data.DataLoader(...)
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10)
>>> for epoch in range(10):
>>> for batch in data_loader:
>>> train_batch(...)
>>> scheduler.step()
Here, they omit the optimizer.step()
in the training loop. And I thought, that makes sense since the optimizer is provided to OneCycleLR in its initialization, so it must be taking care of that on the back end. But doing so gets me the warning:
UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.
Do I ignore that and trust the pseudocode in the docs? Well, I did, and the model didn't do any learning, so the warning is correct and I put optimizer.step()
back in.
This gets to the point that I don't really understand how the optimizer and scheduler interact (edit: how the Learning Rate in the optimizer interacts with the Learning Rate in the scheduler). I see that generally the optimizer is run every mini-batch and the scheduler every epoch, though for OneCycleLR, they want you to run it every mini-batch too.
Any guidance (or a good tutorial article) would be appreciated!
Upvotes: 6
Views: 10801
Reputation: 2190
Use optimizer.step()
before scheduler.step()
. Also, for OneCycleLR
, you need to run scheduler.step()
after every step - source (PyTorch docs). So, your training code is correct (as far as calling step()
on optimizer and schedulers is concerned).
Also, in the example you mentioned, they have passed steps_per_epoch
parameter, but you haven't done so in your training code. This is also mentioned in the docs. This might be causing the issue in your code.
Upvotes: 8