Reputation: 628
I have tried different approaches to sentence similarity, namely:
spaCy models: en_core_web_md
and en_core_web_lg
.
Transformers: using the packages sentence-similarity
and sentence-transformers
, I've tried models such as distilbert-base-uncased
, bert-base-uncased
or sentence-transformers/all-mpnet-base-v2
.
Universal Sentence Encoding: using the package spacy-universal-sentence-encoder
, with the models en_use_md
and en_use_cmlm_lg
.
However, while these models generally correctly detect similarity for equivalent sentences, they all fail when inputting negated sentences. E.g., these opposite sentences:
return a similarity of 0.931 with the model en_use_md
.
However, sentences that could be considered very similar:
return a smaller similarity: 0.914.
My question is: Is there any way around this? Are there any other models/approaches that take into account the affirmative/negative nature of sentences when calculating similarity?
Upvotes: 8
Views: 2715
Reputation: 1
I used the model dmlls/all-mpnet-base-v2-negation and compared 2 sentences : I like rainy days because they make me feel relaxed. I don't like rainy days because they don't make me feel relaxed. and I am getting a Cosine similarity of 0.74 which is quite high. How did you get the score 0.38 ? Sharing the complete code below:
config = AutoConfig.from_pretrained('dmlls/all-mpnet-base-v2-negation')
model = AutoModel.from_config(config)
tokenizer = AutoTokenizer.from_pretrained('dmlls/all-mpnet-base-v2-negation')
a = 'I like rainy days because they make me feel relaxed.'
b = 'I don''t like rainy days because they don''t make me feel relaxed.'
inputs_a = tokenizer(a, return_tensors='pt', padding=True, truncation=True)
inputs_b = tokenizer(b, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
outputs_a = model(**inputs_a)
outputs_b = model(**inputs_b)
embeddings_a = outputs_a.last_hidden_state[:, 0, :]
embeddings_b = outputs_b.last_hidden_state[:, 0, :]
similarity_prob = cosine_similarity(embeddings_a, embeddings_b)
print(similarity_prob)
Upvotes: -1
Reputation: 628
Follow-up on my question:
We recently published the paper This is not correct! Negation-aware Evaluation of Language Generation Systems, which addresses this problem.
The following artifacts were released as a result from our work:
Coming back to the examples in the question, the model dmlls/all-mpnet-base-v2-negation
reports the following scores:
I like rainy days because they make me feel relaxed.
I don't like rainy days because they don't make me feel relaxed.
Cosine similarity: 0.386
I like rainy days because they make me feel relaxed.
I enjoy rainy days because they make me feel calm.
Cosine similarity: 0.948
While admittedly this work does not completely solve the negation problem in modern NLP models, we believe it is a step forward in the right direction, and hopefully useful for the NLP community!
Upvotes: 3
Reputation: 14993
Your question is pertinent and I believe this thought has been across everybody's mind at some point.
If you want to evaluate the logical connection between two sentences, using cosine similarity or euclidean distance on top of some pre-determined embeddings will not suffice.
The actual logical connection between two sentences can be determined via an RTE
task (recognizing textual entailment).
The Multi-Genre Natural Language Inference (MultiNLI) : https://cims.nyu.edu/~sbowman/multinli/, is a dataset built specifically on this task of TE (textual entailment, in the context of natural language inference). In essence there are 3 labels (contradiction, neutral and entailment).
At the other end of Pennsylvania Avenue, people began to line up for a White House tour.
People formed a line at the end of Pennsylvania Avenue.
In this case, there is an entailment between the two sentences.
HuggingFace also has some pre-built models for MNLI. You can check for models such as distilbert-base-uncased-mnli
, roberta-large-mnli
, which are specifically fine-tuned for this task and consider those aforementioned as starting points in your task.
Upvotes: 5
Reputation: 15593
Handling negation is one of the hard problems in NLP.
A lot of similarity methods will work by averaging the vectors of words in a sentence, in which case one sentence is the other plus the vector for the word "not", which is not going to be very different. Opposites are also usually discussed together frequently, so they're "similar" in that sense, which is the way the word "similar" is usually used in NLP.
There are ways to work around this, often employed in sentiment analysis, but they usually don't "just work". If you can narrow down what kinds of negation you expect to see you might have more success. negspaCy is an unofficial spaCy component that can help detect negation of named entities, which is often useful in medical text ("does not have cancer"), for example. But you have to figure out what to do with that information, and it doesn't help with similarity scores.
You might have some luck using models trained to classify entailment - which classify whether some statement implies, contradicts, or has no bearing on another statement.
Upvotes: 3