Leonid Ganeline
Leonid Ganeline

Reputation: 616

spacy similarity bigger than 1

The spaCy similarity works strange sometimes. If we compare the completely equal texts, we got a score of 1.0. but the texts are almost equal we can get a score > 1. This behavior could harm our code. Why we got this > 1.0 score and can we predict it?

def calc_score(text_source, text_target):
    return nlp(text_source).similarity(nlp(text_target))

# nlp = spacy.load('en_core_web_md')
calc_score('software development', 'Software development')
# 1.0000000155153665

Upvotes: 0

Views: 592

Answers (1)

Jules Gagnon-Marchand
Jules Gagnon-Marchand

Reputation: 3781

From https://spacy.io/usage/vectors-similarity:

Identical tokens are obviously 100% similar to each other (just not always exactly 1.0, because of vector math and floating point imprecisions).

Just use np.clip as per https://stackoverflow.com/a/13232356/447599

Upvotes: 1

Related Questions