Compare cosine similarity of word with BERT model

Question

Hi I am looking to generate similar words for a word using BERT model, the same approach we use in gensim to generate most_similar word, I found the approach as:

from transformers import BertTokenizer, BertModel

import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

model = BertModel.from_pretrained('bert-base-uncased')

word = "Hello"

inputs = tokenizer(word, return_tensors="pt")

outputs = model(**inputs)

word_vect = outputs.pooler_output.detach().numpy()

Okay, now this gives me the embedding for input word given by user, so can we compare this embedding with complete BERT model for cosine similarity to find top N embeddings that are closest match with that word, and then convert the embeddings to word using the vocab.txt file in the model? is it possible?

Compare cosine similarity of word with BERT model

Answers (1)

Related Questions