buddy
buddy

Reputation: 197

How to obtain contextual embedding for a phrase in a sentence using BERT?

I use https://github.com/UKPLab/sentence-transformers to obtain sentence embedding from BERT. Using this I am able to obtain embedding for sentences or phrases. For example: I can get embedding of a sentence like "system not working given to service center but no response on replacement". I can also get embedding of a phrase like "no response".

However I want to get embedding of "no response" in the context of "system not working given to service center but no response on replacement". Any pointers on how to obtain this will be helpful. Thanks in advance.

I am trying to do this because the phrase "no response" has different contexts in different sentences. For example the context of "no response" is different in the following two sentences: "system not working given to service center but no response on replacement" "we tried recovery procedure on the patient but there was no response"

Upvotes: 2

Views: 2337

Answers (2)

Franck Dernoncourt
Franck Dernoncourt

Reputation: 83187

For a better phrase embedding, you can try Phrase-BERT for phrase embeddings.

The paper also mentions related previous work, e.g. SentBERT and SpanBERT.

Not conditional though I believe.

Upvotes: 1

Jindřich
Jindřich

Reputation: 11240

BERT returns one vector per input sub-word, so you need to get the vectors that correspond to the phrase you are interested in.

What is usually called a sentence embeddings is either the embedding of the technical symbol [CLS] that is prepended to the sentence before processing it with BERT; or an average of the contextual sub-word vectors. Because the [CLS] vector necessarily covers the entire sentence, you cannot get it just for a sub-phrase, but you can use the average of the sub-word embeddings of the phrase.

The package you are using, sentence-transformers, has a very simple user-friendly API, but I am afraid it is not strong enough to do this job. I'd suggest using Huggingface's Transormers. This package allows you to view how the sentence got tokenized and thus obtain the corresponding vectors.

Upvotes: 2

Related Questions