How to use Pytorch OneCycleLR in a training loop (and optimizer/scheduler interactions)?

Question

I'm training an NN and using RMSprop as an optimizer and OneCycleLR as a scheduler. I've been running it like this (in slightly simplified code):

optimizer = torch.optim.RMSprop(model.parameters(), lr=0.00001, 
                              alpha=0.99, eps=1e-08, weight_decay=0.0001, momentum=0.0001, centered=False)
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.0005, epochs=epochs)

    for epoch in range(epochs):
        model.train()
        for counter, (images, targets) in enumerate(train_loader):

            # clear gradients from last run
            optimizer.zero_grad()

            # Run forward pass through the mini-batch
            outputs = model(images)

            # Calculate the losses
            loss = loss_fn(outputs, targets)

            # Calculate the gradients
            loss.backward()

            # Update parameters
            optimizer.step()   # Optimizer before scheduler????
            scheduler.step()

            # Check loss on training set
            test()

Note the optimizer and scheduler calls in each mini-batch. This is working, though when I plot the learning rates through the training, the curve is very bumpy. I checked the docs again, and this is the example shown for torch.optim.lr_scheduler.OneCycleLR

>>> data_loader = torch.utils.data.DataLoader(...)
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10)
>>> for epoch in range(10):
>>>     for batch in data_loader:
>>>         train_batch(...)
>>>         scheduler.step()

Here, they omit the optimizer.step() in the training loop. And I thought, that makes sense since the optimizer is provided to OneCycleLR in its initialization, so it must be taking care of that on the back end. But doing so gets me the warning:

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.

Do I ignore that and trust the pseudocode in the docs? Well, I did, and the model didn't do any learning, so the warning is correct and I put optimizer.step() back in.

This gets to the point that I don't really understand how the optimizer and scheduler interact (edit: how the Learning Rate in the optimizer interacts with the Learning Rate in the scheduler). I see that generally the optimizer is run every mini-batch and the scheduler every epoch, though for OneCycleLR, they want you to run it every mini-batch too.

Any guidance (or a good tutorial article) would be appreciated!

How to use Pytorch OneCycleLR in a training loop (and optimizer/scheduler interactions)?

Answers (1)

Related Questions