hamzaouni
hamzaouni

Reputation: 3

RuntimeError: "element 0 of tensors does not require grad and does not have a grad_fn"

I am facing an issue while training a comment classification model using PyTorch Lightning with a pre-trained BERT model.

I encountered the following error during the training process:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

To provide some context, I have already enabled gradients for all parameters of the model using the function enable_gradients(model). However, the error still persists.

The model I am using is based on the aubmindlab/bert-base-arabertv02-twitter pre-trained model, and I noticed that some weights of the BERT model were not initialized properly upon loading. I have ensured that I am using the latest versions of PyTorch, Transformers, and PyTorch Lightning.

I attempted to pretrain the BERT model on a downstream task before training my specific model, but the error remains unresolved.

How to resolve this issue?

this is my code :

from pytorch_lightning import Trainer

def enable_gradients(model):
    for param in model.parameters():
        param.requires_grad = True

# datamodule
ucc_data_module = UCC_Data_Module(train_path, val_path, test_path, attributes=attributes, batch_size=config['batch_size'])
ucc_data_module.setup()

# model
model = UCC_Comment_Classifier()

enable_gradients(model)

# trainer and fit
# Instantiation of the Lightning Trainer
trainer = Trainer(max_epochs=config['n_epochs'], accelerator='gpu', num_sanity_val_steps=1)

try:
    trainer.fit(model, ucc_data_module)
    torch.save(model.state_dict(), PATH)
except RuntimeError as e:
    print(e)

This is the error :

ProcessRaisedException: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 
147, in _wrapping_function
    results = function(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 568, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 973, in _run
    results = self._run_stage()
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1016, in _run_stage
    self.fit_loop.run()
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run
    self.advance()
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 354, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 133, in run
    self.advance(data_fetcher)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 218, in 
advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 185, in 
run
    self._optimizer_step(kwargs.get("batch_idx", 0), closure)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 260, in 
_optimizer_step
    call._call_lightning_module_hook(
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 144, in 
_call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1256, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 155, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 256, in optimizer_step
    optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 225, in 
optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 114,
in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
    return wrapped(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/optimization.py", line 439, in step
    loss = closure()
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 101,
in _wrap_closure
    closure_result = closure()
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 140, in 
__call__
    self._result = self.closure(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 135, in 
closure
    self._backward_fn(step_output.closure_loss)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 232, in 
backward_fn
    call._call_strategy_hook(self.trainer, "backward", loss, optimizer)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 291, in 
_call_strategy_hook
    output = fn(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 200, in backward
    self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 67, 
in backward
    model.backward(tensor, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1046, in backward
    loss.backward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Upvotes: 0

Views: 754

Answers (1)

V12
V12

Reputation: 120

A workaround for you is to add torch.set_grad_enabled(True) at the beginning for training_step, or use the AdamW optimizer from torch.

Upvotes: -1

Related Questions