Enrico Shippole
Enrico Shippole

Reputation: 97

Get attention masks from HF pipelines

How should returned attention masks be accessed from the FeatureExtractionPipeline in Huggingface?

The code below takes an embedding model, distributes it and a huggingface dataset across 8 GPUs on a single node, and performs inference on the inputs. The code requires the attention masks for mean pooling.

Code example:

from accelerate import Accelerator
from accelerate.utils import tqdm
from transformers import AutoTokenizer, AutoModel
from optimum.bettertransformer import BetterTransformer

import torch

from datasets import load_dataset

from transformers import pipeline

accelerator = Accelerator()

model_name = "BAAI/bge-large-en-v1.5"

tokenizer = AutoTokenizer.from_pretrained(model_name,)

model = AutoModel.from_pretrained(model_name,)

pipe = pipeline(
    "feature-extraction",
    model=model,
    tokenizer=tokenizer,
    max_length=512,
    truncation=True,
    padding=True,
    pad_to_max_length=True,
    batch_size=256,
    framework="pt",
    return_tensors=True,
    return_attention_mask=True,
    device=(accelerator.device)
)

dataset = load_dataset(
    "wikitext",
    "wikitext-2-v1",
    split="train",
)

#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Assume 8 processes

with accelerator.split_between_processes(dataset["text"]) as data:

    for out in pipe(data):

        sentence_embeddings = mean_pooling(out, out["attention_mask"])

I need the attention maks from pipe to use for mean pooling.

Best,

Enrico

Upvotes: 1

Views: 410

Answers (2)

Joseph Catrambone
Joseph Catrambone

Reputation: 302

You can often directly access the tokenizer from the pipe and call it with your string to get the attention mask:

>>> pipe.tokenizer("Blah blah blah.")
{'input_ids': [101, 27984, 27984, 27984, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1]}

>>> pipe.tokenizer("Blah blah blah.")['attention_mask']
{'attention_mask': [1, 1, 1, 1, 1, 1]}

But even if that's not an option, it looks like you have access to the tokenizer at initialization. Why not use that directly?

Upvotes: 0

druskacik
druskacik

Reputation: 2497

The pipeline object from the transformers library provides a convenient abstraction for quick inference of models, but for more customized solutions it's usually a good idea to use the models directly. For example:

text = 'This is a test.'

tokenized = tokenizer(
    text,
    max_length=512,
    truncation=True,
    padding=True,
    return_attention_mask=True,
    return_tensors='pt').to(accelerator.device)

out = model(**tokenized)

embeddings = out.last_hidden_state
attention_mask = tokenized['attention_mask']

You can then use the embeddings and attention_mask to compute the mean pooling. You may also consider using out.pooler_output instead of computing the mean pooling manually, however, I am not sure how the pooler_output is computed in this case, so be wary.

Upvotes: 0

Related Questions