pythonnlpspacyhuggingface-transformerssentence-similarity

Reputation: 628

Sentence similarity models not capturing opposite sentences

I have tried different approaches to sentence similarity, namely:

spaCy models: en_core_web_md and en_core_web_lg.
Transformers: using the packages sentence-similarity and sentence-transformers, I've tried models such as distilbert-base-uncased, bert-base-uncased or sentence-transformers/all-mpnet-base-v2.
Universal Sentence Encoding: using the package spacy-universal-sentence-encoder, with the models en_use_md and en_use_cmlm_lg.

However, while these models generally correctly detect similarity for equivalent sentences, they all fail when inputting negated sentences. E.g., these opposite sentences:

"I like rainy days because they make me feel relaxed."
"I don't like rainy days because they don't make me feel relaxed."

return a similarity of 0.931 with the model en_use_md.

However, sentences that could be considered very similar:

"I like rainy days because they make me feel relaxed."
"I enjoy rainy days because they make me feel calm."

return a smaller similarity: 0.914.

My question is: Is there any way around this? Are there any other models/approaches that take into account the affirmative/negative nature of sentences when calculating similarity?

Upvotes: 8

Answers (4)

Arundhati

Reputation: 1

I used the model dmlls/all-mpnet-base-v2-negation and compared 2 sentences : I like rainy days because they make me feel relaxed. I don't like rainy days because they don't make me feel relaxed. and I am getting a Cosine similarity of 0.74 which is quite high. How did you get the score 0.38 ? Sharing the complete code below:

config = AutoConfig.from_pretrained('dmlls/all-mpnet-base-v2-negation')
model =  AutoModel.from_config(config)
tokenizer = AutoTokenizer.from_pretrained('dmlls/all-mpnet-base-v2-negation')

a = 'I like rainy days because they make me feel relaxed.'
b = 'I don''t like rainy days because they don''t make me feel relaxed.'

inputs_a = tokenizer(a, return_tensors='pt', padding=True, truncation=True)
inputs_b = tokenizer(b, return_tensors='pt', padding=True, truncation=True)

with torch.no_grad():
    outputs_a = model(**inputs_a)
    outputs_b = model(**inputs_b)

embeddings_a = outputs_a.last_hidden_state[:, 0, :]
embeddings_b = outputs_b.last_hidden_state[:, 0, :]

similarity_prob = cosine_similarity(embeddings_a, embeddings_b)
print(similarity_prob)

Upvotes: -1

Diego Miguel

Reputation: 628

Follow-up on my question:

We recently published the paper This is not correct! Negation-aware Evaluation of Language Generation Systems, which addresses this problem.

The following artifacts were released as a result from our work:

The CANNOT (Compilation of ANnotated, Negation-Oriented Text-pairs) dataset, which focuses on negated textual pairs. It is available on GitHub and Hugging Face. Useful to finetune models in order to improve their sensitivity towards negations.
A rule-based sentence negator for Python: Negate. Useful to generate negation training data.
Finetuned, negation-aware Sentence Transformer models. These models report much lower scores for negated sentence pairs when compared to their base models (see example below):
- dmlls/all-mpnet-base-v2-negation
- tum-nlp/NegMPNet
A negation-aware evaluation metric:
- tum-nlp/NegBLEURT

Coming back to the examples in the question, the model dmlls/all-mpnet-base-v2-negation reports the following scores:

I like rainy days because they make me feel relaxed.
I don't like rainy days because they don't make me feel relaxed.

Cosine similarity: 0.386

I like rainy days because they make me feel relaxed.
I enjoy rainy days because they make me feel calm.

Cosine similarity: 0.948

While admittedly this work does not completely solve the negation problem in modern NLP models, we believe it is a step forward in the right direction, and hopefully useful for the NLP community!

Upvotes: 3

Timbus Calin

Reputation: 14993

Your question is pertinent and I believe this thought has been across everybody's mind at some point.

If you want to evaluate the logical connection between two sentences, using cosine similarity or euclidean distance on top of some pre-determined embeddings will not suffice.

The actual logical connection between two sentences can be determined via an RTE task (recognizing textual entailment).

The Multi-Genre Natural Language Inference (MultiNLI) : https://cims.nyu.edu/~sbowman/multinli/, is a dataset built specifically on this task of TE (textual entailment, in the context of natural language inference). In essence there are 3 labels (contradiction, neutral and entailment).

At the other end of Pennsylvania Avenue, people began to line up for a White House tour.

People formed a line at the end of Pennsylvania Avenue.

In this case, there is an entailment between the two sentences.

HuggingFace also has some pre-built models for MNLI. You can check for models such as distilbert-base-uncased-mnli, roberta-large-mnli, which are specifically fine-tuned for this task and consider those aforementioned as starting points in your task.

Upvotes: 5

polm23

Reputation: 15593

Handling negation is one of the hard problems in NLP.

A lot of similarity methods will work by averaging the vectors of words in a sentence, in which case one sentence is the other plus the vector for the word "not", which is not going to be very different. Opposites are also usually discussed together frequently, so they're "similar" in that sense, which is the way the word "similar" is usually used in NLP.

There are ways to work around this, often employed in sentiment analysis, but they usually don't "just work". If you can narrow down what kinds of negation you expect to see you might have more success. negspaCy is an unofficial spaCy component that can help detect negation of named entities, which is often useful in medical text ("does not have cancer"), for example. But you have to figure out what to do with that information, and it doesn't help with similarity scores.

You might have some luck using models trained to classify entailment - which classify whether some statement implies, contradicts, or has no bearing on another statement.

Upvotes: 3

Sentence similarity models not capturing opposite sentences

Answers (4)

Related Questions