Pablo Sanchez
Pablo Sanchez

Reputation: 333

Why does time to compute a single training step increase over time for some seeds/configurations of ADAM but not for SGD in PyTorch?

I'm working on a PyTorch project using PyTorch Lightning (version 1.8.4) to train a neural network. I've noticed that the time it takes to compute a single training step increases over time for some seeds and configurations of the ADAM optimizer, but not for SGD.

conda create --name my_env python=3.9.12  --no-default-packages
conda activate my_env

pip install torch==1.13.1 torchvision==0.14.1
pip install pytorch-lightning==1.8.4

Here's a figure that shows the increase in training time over time for some configurations of ADAM:

enter image description here

I'm using PyTorch Lightning with the automatic differentiation disabled:

    @property
    def automatic_optimization(self):
        return False

Thus, my training_step looks like this

    def training_step(self, train_batch, batch_idx):
        log_dict = {}

        tic = time.time()
        loss_dict = {}
        opt = self.optimizers(use_pl_optimizer=False)
        loss = self(train_batch)
        loss_dict['loss'] = loss


        opt.zero_grad()
        self.manual_backward(loss.mean())
        opt.step()

        self.update_log_dict(log_dict=log_dict, my_dict=loss_dict)

        log_dict['time_step'] = torch.tensor(time.time() - tic)
        return log_dict

I'm wondering if anyone has experienced this issue before or has any suggestions for how to address it. Thank you!

Upvotes: 0

Views: 258

Answers (1)

Aniket Maurya
Aniket Maurya

Reputation: 380

I haven't observed this issue. optimizer.step(...) is called outside of training_step (if you don't disable automatic optimization) hence the optimizer configuration shouldn't have an effect on training step time. Could you provide a reproducible script for this? Also maybe consider creating an issue on Github

Upvotes: 0

Related Questions