Reputation: 333
I'm working on a PyTorch project using PyTorch Lightning (version 1.8.4) to train a neural network. I've noticed that the time it takes to compute a single training step increases over time for some seeds and configurations of the ADAM optimizer, but not for SGD.
conda create --name my_env python=3.9.12 --no-default-packages
conda activate my_env
pip install torch==1.13.1 torchvision==0.14.1
pip install pytorch-lightning==1.8.4
Here's a figure that shows the increase in training time over time for some configurations of ADAM:
I'm using PyTorch Lightning with the automatic differentiation disabled:
@property
def automatic_optimization(self):
return False
Thus, my training_step looks like this
def training_step(self, train_batch, batch_idx):
log_dict = {}
tic = time.time()
loss_dict = {}
opt = self.optimizers(use_pl_optimizer=False)
loss = self(train_batch)
loss_dict['loss'] = loss
opt.zero_grad()
self.manual_backward(loss.mean())
opt.step()
self.update_log_dict(log_dict=log_dict, my_dict=loss_dict)
log_dict['time_step'] = torch.tensor(time.time() - tic)
return log_dict
I'm wondering if anyone has experienced this issue before or has any suggestions for how to address it. Thank you!
Upvotes: 0
Views: 258
Reputation: 380
I haven't observed this issue. optimizer.step(...)
is called outside of training_step
(if you don't disable automatic optimization) hence the optimizer configuration shouldn't have an effect on training step time. Could you provide a reproducible script for this? Also maybe consider creating an issue on Github
Upvotes: 0