Why does time to compute a single training step increase over time for some seeds/configurations of ADAM but not for SGD in PyTorch?

Question

I'm working on a PyTorch project using PyTorch Lightning (version 1.8.4) to train a neural network. I've noticed that the time it takes to compute a single training step increases over time for some seeds and configurations of the ADAM optimizer, but not for SGD.

conda create --name my_env python=3.9.12  --no-default-packages
conda activate my_env

pip install torch==1.13.1 torchvision==0.14.1
pip install pytorch-lightning==1.8.4

Here's a figure that shows the increase in training time over time for some configurations of ADAM:

I'm using PyTorch Lightning with the automatic differentiation disabled:

    @property
    def automatic_optimization(self):
        return False

Thus, my training_step looks like this

    def training_step(self, train_batch, batch_idx):
        log_dict = {}

        tic = time.time()
        loss_dict = {}
        opt = self.optimizers(use_pl_optimizer=False)
        loss = self(train_batch)
        loss_dict['loss'] = loss


        opt.zero_grad()
        self.manual_backward(loss.mean())
        opt.step()

        self.update_log_dict(log_dict=log_dict, my_dict=loss_dict)

        log_dict['time_step'] = torch.tensor(time.time() - tic)
        return log_dict

I'm wondering if anyone has experienced this issue before or has any suggestions for how to address it. Thank you!

Why does time to compute a single training step increase over time for some seeds/configurations of ADAM but not for SGD in PyTorch?

Answers (1)

Related Questions