Reputation: 181
I'm trying to make a Sarcasm detector with Lightning in this Kaggle notebook.
I'm using HuggingFace Transformers to achieve this.
When I start the training, I get this error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
This is my LightningModule:
class SarcasmTagger(pl.LightningModule):
def __init__(
model_name: str,
n_classes: int,
self.bert = BertModel.from_pretrained(model_name, return_dict=True)
#self.bert = BertForSequenceClassification.from_pretrained(model_name, return_dict=True)
self.classifier = nn.Linear(self.bert.config.hidden_size, n_classes)
self.n_training_steps = n_training_steps
self.n_warmup_steps = n_warmup_steps
def forward(self, input_ids, attention_mask):
outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
logits = self.classifier(outputs.pooler_output)
return logits
def shared_step(self, batch, batch_idx):
input_ids = batch["input_ids"]
attention_mask = batch["attention_mask"]
label = batch["label"].view(-1, 1)
logits = self(input_ids=input_ids, attention_mask=attention_mask)
loss = nn.functional.cross_entropy(logits, label)
return logits, loss, label
def training_step(self, batch, batch_idx):
logits, loss, label = self.shared_step(batch, batch_idx)
self.log("train_loss", loss, prog_bar=True, logger=True)
return {"loss": loss, "predictions": logits, "label": label}
def validation_step(self, batch, batch_idx):
logits, loss, label = self.shared_step(batch, batch_idx)
self.log("val_loss", loss, prog_bar=True, logger=True)
return loss
def test_step(self, batch, batch_idx):
logits, loss, label = self.shared_step(batch, batch_idx)
self.log("test_loss", loss, prog_bar=True, logger=True)
return loss
def configure_optimizers(self):
optimizer = AdamW(self.parameters(), lr=2e-5)
scheduler = get_linear_schedule_with_warmup(
return dict(
As I'm aware, this error is related to the back propagation. But I'm not personally calling detach anywhere in the code to generate this issue.
What can be happening here?
Upvotes: 0
Views: 1063
Reputation: 181
I found the solution!!!
I tried to change the versions according to what I’ve read here: Therefore, I changed my install packages part to:
!pip install torch==2.0.0+cu117
!pip install pytorch-lightning==1.9.4
!pip install accelerate==0.21.0
!pip install tokenizers==0.13.3
!pip install transformers==4.26.1
But the error was still popping up. So, I thought the error could be related to the optimizer used. My optimizer was this one:
def configure_optimizers(self):
optimizer = AdamW(self.parameters(), lr=2e-5)
scheduler = get_linear_schedule_with_warmup(
return dict(
When I changed my method to use a simple Adam optimizer:
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=2e-5)
return [optimizer]
It worked!
So, the problem is in the AdamW with a scheduler. Reversing the install packages to just:
!pip install -q transformers
Makes the training work.
As the AdamW is deprecated, I think it is a good idea change the code to use the torch.optim.Adam/AdamW for instance.
So, summarizing, the new version of transformer might have introduced a bug in AdamW that is making the tensors lose their gradient function. Something like a detach anywhere in the code, probably.
But, anyway, this is a bug in the transformers.AdamW.
Upvotes: 1