lcoandrade
lcoandrade

Reputation: 181

(HuggingFace Transformers) NLP with RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I'm trying to make a Sarcasm detector with Lightning in this Kaggle notebook.

I'm using HuggingFace Transformers to achieve this.

When I start the training, I get this error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

This is my LightningModule:

class SarcasmTagger(pl.LightningModule):

def __init__(
    self, 
    model_name: str, 
    n_classes: int, 
    n_training_steps=None, 
    n_warmup_steps=None
):
    super().__init__()
    self.bert = BertModel.from_pretrained(model_name, return_dict=True)
    #self.bert =  BertForSequenceClassification.from_pretrained(model_name, return_dict=True)
    self.classifier = nn.Linear(self.bert.config.hidden_size, n_classes)
    self.n_training_steps = n_training_steps
    self.n_warmup_steps = n_warmup_steps

def forward(self, input_ids, attention_mask):
    outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
    #print(outputs)
    logits = self.classifier(outputs.pooler_output)
    return logits

def shared_step(self, batch, batch_idx):
    input_ids = batch["input_ids"]
    attention_mask = batch["attention_mask"]
    label = batch["label"].view(-1, 1)
    logits = self(input_ids=input_ids, attention_mask=attention_mask)
    loss = nn.functional.cross_entropy(logits, label)
    return logits, loss, label
    

def training_step(self, batch, batch_idx):
    logits, loss, label = self.shared_step(batch, batch_idx)
    self.log("train_loss", loss, prog_bar=True, logger=True)
    return {"loss": loss, "predictions": logits, "label": label}

def validation_step(self, batch, batch_idx):
    logits, loss, label = self.shared_step(batch, batch_idx)
    self.log("val_loss", loss, prog_bar=True, logger=True)
    return loss

def test_step(self, batch, batch_idx):
    logits, loss, label = self.shared_step(batch, batch_idx)
    self.log("test_loss", loss, prog_bar=True, logger=True)
    return loss

def configure_optimizers(self):
    optimizer = AdamW(self.parameters(), lr=2e-5)

    scheduler = get_linear_schedule_with_warmup(
      optimizer,
      num_warmup_steps=self.n_warmup_steps,
      num_training_steps=self.n_training_steps
    )

    return dict(
        optimizer=optimizer,
        lr_scheduler=dict(
            scheduler=scheduler,
            interval='step')
    )

As I'm aware, this error is related to the back propagation. But I'm not personally calling detach anywhere in the code to generate this issue.

What can be happening here?

Upvotes: 0

Views: 1063

Answers (1)

lcoandrade
lcoandrade

Reputation: 181

I found the solution!!!

I tried to change the versions according to what I’ve read here: Therefore, I changed my install packages part to:

!pip install torch==2.0.0+cu117
!pip install pytorch-lightning==1.9.4
!pip install accelerate==0.21.0
!pip install tokenizers==0.13.3
!pip install transformers==4.26.1

But the error was still popping up. So, I thought the error could be related to the optimizer used. My optimizer was this one:

def configure_optimizers(self):
        optimizer = AdamW(self.parameters(), lr=2e-5)

        scheduler = get_linear_schedule_with_warmup(
          optimizer,
          num_warmup_steps=self.n_warmup_steps,
          num_training_steps=self.n_training_steps
        )

        return dict(
            optimizer=optimizer,
            lr_scheduler=dict(
                scheduler=scheduler,
                interval='step')
        )

When I changed my method to use a simple Adam optimizer:

def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=2e-5)
        return [optimizer]

It worked!

So, the problem is in the AdamW with a scheduler. Reversing the install packages to just:

!pip install -q transformers

Makes the training work.

As the AdamW is deprecated, I think it is a good idea change the code to use the torch.optim.Adam/AdamW for instance.

So, summarizing, the new version of transformer might have introduced a bug in AdamW that is making the tensors lose their gradient function. Something like a detach anywhere in the code, probably.

But, anyway, this is a bug in the transformers.AdamW.

Upvotes: 1

Related Questions