How to use a batch size bigger than zero in Bert Sequence Classification

Question

Hugging Face documentation describes how to do a sequence classification using a Bert model:

from transformers import BertTokenizer, BertForSequenceClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
labels = torch.tensor([1]).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, labels=labels)

loss, logits = outputs[:2]

However, there is only example for batch size 1. How to implement it when we have a list of phrases and want to use a bigger batch size?

Lysandre · Accepted Answer

in that example unsqueeze is used to add a dimension to the input/labels, so that it is an array of size (batch_size, sequence_length). If you want to use a batch size > 1, you can build an array of sequences instead, like in the following example:

from transformers import BertTokenizer, BertForSequenceClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

sequences = ["Hello, my dog is cute", "My dog is cute as well"]
input_ids = torch.tensor([tokenizer.encode(sequence, add_special_tokens=True) for sequence in sequences])
labels = torch.tensor([[1], [0]]) # Labels depend on the task
outputs = model(input_ids, labels=labels)

loss, logits = outputs[:2]

In that example, both sequences get encoded in the same number of tokens so it's easy to build a tensor containing both sequences, but if they have a differing amount of elements you would need to pad the sequences and tell the model which tokens it should attend to (so that it ignores the padded values) using an attention mask.

There is an entry in the glossary concerning attention masks which explains their purpose and usage. You pass this attention mask to the model when calling its forward method.

How to use a batch size bigger than zero in Bert Sequence Classification

Answers (1)

Related Questions