Konder
Konder

Reputation: 53

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not tuple

I am studying NLP and trying to make a model for classifying sentences. I am creating my class with a model but I get an error saying that the input should be of type Tensor, not tuple. I use 4.21.2 transformers version.

class BertClassificationModel(nn.Module):
    def __init__(self, bert_model_name, num_labels, dropout=0.1):
        super(BertClassificationModel, self).__init__()
        self.bert = BertForSequenceClassification.from_pretrained(bert_model_name, return_dict=False)
        self.dropout = nn.Dropout(dropout)
        self.classifier = nn.Linear(768, num_labels)
        self.num_labels = num_labels
    def forward(self, input_ids, attention_mask=None, token_type_ids=None):
        pooled_output = self.bert(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        return logits

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not tuple

Upvotes: 1

Views: 3372

Answers (2)

cronoik
cronoik

Reputation: 19355

The issue you face, is that the output of self.bert is not a tensor but a tuple:

from transformers import BertForSequenceClassification, BertTokenizer

bert_model_name = "bert-base-cased"
t = BertTokenizer.from_pretrained(bert_model_name)
m = BertForSequenceClassification.from_pretrained(bert_model_name, return_dict=False)

o=m(**t("test test", return_tensors="pt"))

print(type(o))

Output:

tuple

I personally do not recommend using return_dict=False as the code becomes more difficult to read. But changing this parameter doesn't help in your case, as you want to use the pooler output which is removed by the classification head of BertForSequenceClassification (the output of BertForSequenceClassification is listed here).

You already wrote in your own answer, that you don't intend to use the classification head of BertForSequenceClassification and you can therefore load BertModel directly (instead of initializing BertForSequenceClassification and only using BERT as you did with: BertForSequenceClassification.from_pretrained(bert_model_name, return_dict=True).bert):

from torch import nn
from transformers import BertModel, BertTokenizer

class BertClassificationModel(nn.Module):
    def __init__(self, bert_model_name, num_labels, dropout=0.1):
        super(BertClassificationModel, self).__init__()
        self.bert = BertModel.from_pretrained(bert_model_name)
        self.dropout = nn.Dropout(dropout)
        self.classifier = nn.Linear(768, num_labels)
        self.num_labels = num_labels
    def forward(self, input_ids, attention_mask=None, token_type_ids=None):
        pooled_output = self.bert(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids).pooler_output
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        return logits


m = BertClassificationModel("bert-base-cased",4, 0.1)
o = m(**t("test test", return_tensors="pt"))
print(o.shape)

Output:

torch.Size([1, 4])

Upvotes: 1

Konder
Konder

Reputation: 53

I figured out the problem. I wanted to add my own classification layers, but when i get BertForSequenceClassification.from_pretrained the output model already get this layers. In order to add yourself classification layers should get BertForSequenceClassification.from_pretrained(bert_model_name, return_dict=True).bert This returns clear model. As a result, my class looks like this:

class BertClassificationModel(nn.Module):
    def __init__(self, bert_model_name, num_labels, dropout=0.1):
        super(BertClassificationModel, self).__init__()
        self.bert = BertForSequenceClassification.from_pretrained(bert_model_name, return_dict=True).bert
        self.dropout = nn.Dropout(dropout)
        self.classifier = nn.Linear(768, num_labels)
        self.num_labels = num_labels
    def forward(self, input_ids, attention_mask=None, token_type_ids=None):
        pooled_output = self.bert(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)['pooler_output']
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        return logits

Upvotes: 0

Related Questions