NLP: Finding which sentence is closest in meaning to a list of other sentences

Question

I have two lists of sentences (list A and list B). I want to find which sentence in A is closest in meaning to the entirety of B.

This is not the same as the standard cosine similarity check you can do when comparing (in spacy for example) two doc objects: even if i iterate through A and compare each element of A to all elements of B, it leaves me with a number of cosine similarity scores, while i want just one number to represent the closeness of each element of A to all of B.

So far I have tried the folowing: for every element in A, perform cosine similarity check with every element in B, leaving me with a list of values equal in length to B. Then I calculate the average of this list, leaving me with a single value which would ideally represent how close that element of A was to all of B.

The issue with that approach is that the averaging results in too much information loss and by the time ive done this for all elements of A, there isnt much difference in these condensed averages and therefore hard to conclude which element of A is closest to all of B.

P.S. I can show code if asked but feel it's irrelevant because the issue is with the approach itself, not broken code.

NLP: Finding which sentence is closest in meaning to a list of other sentences

Answers (1)

Related Questions