Reputation: 9
Im trying to obtain sentence embeddings for Bert but Im not quite sure if Im doing it properly... and yes Im aware that exist such tools already such as bert-as-service but I want to do it myself and understand how it works.
Lets say I want to extract a sentence embedding from word embeddings from the following sentence "I am.". As I understood Bert outputs in the form of (12, seq_lenght, 768). I extracted each word embedding from the last encoder layer in the form of (1, 768). My doubt now lies in extracting the sentence from these two word vectors. If I have (2,768) should I sum the dim=1 and obtain a vector of (1,768)? Or maybe concatenate the two words (1, 1536) and applying a (mean) pooling and get the sentence vector in shape of (1, 768). Im not sure what is the right approach is to obtain the sentence vector for this given example is.
Upvotes: 0
Views: 2576
Reputation: 61
as I know, BERT had a comment line in its source code:
For classification tasks, the first vector (corresponding to
[CLS]
) is used as the "sentence vector." Note that this only makes sense because the entire model is fine-tuned.
[CLS]
provided by BERT for sentence embeddings without any combination or processing from all the word vectors in the sentence.
Upvotes: 4