Dilrukshi Perera
Dilrukshi Perera

Reputation: 997

How to get the probability of a particular token(word) in a sentence given the context

I'm trying to calculate the probability or any type of score for words in a sentence using NLP. I've tried this approach with GPT2 model using Huggingface Transformers library, but, I couldn't get satisfactory results due to the model's unidirectional nature which for me didn't seem to predict within context. So I was wondering whether there is a way, to calculate the above said using BERT since it's Bidirectional.

I've found this post relatable, which I randomly saw the other day but didn't see any answer which would be useful for me as well.

Hope I will be able to receive ideas or a solution for this. Any help is appreciated. Thank you.

Upvotes: 5

Views: 5650

Answers (1)

Jindřich
Jindřich

Reputation: 11240

BERT is trained as a masked language model, i.e., it is trained to predict tokens that were replaced by a [MASK] token.

from transformers import AutoTokenizer, BertForMaskedLM

tok = AutoTokenizer.from_pretrained("bert-base-cased")
bert = BertForMaskedLM.from_pretrained("bert-base-cased")

input_idx = tok.encode(f"The {tok.mask_token} were the best rock band ever.")
logits = bert(torch.tensor([input_idx]))[0]
prediction = logits[0].argmax(dim=1)
print(tok.convert_ids_to_tokens(prediction[2].numpy().tolist()))

It prints token no. 11581 which is:

Beatles

To get a normalized probability distribution over BERT's vocabulary, you can normalize the logits using the softmax function, i.e., F.softmax(logits, dim=1), (assuming standart import torch.nn.fucntional as F).

The tricky thing is that words might be split into multiple subwords. You can simulate that by adding multiple [MASK] tokens, but then you have a problem with how to compare the scores of prediction so different lengths reliably. I would probably average the probabilities, but maybe there is a better way.

Upvotes: 7

Related Questions