vmk
vmk

Reputation: 105

How to find that one text is similar to the part of another?

We know how to make an assessment of the similarity of two whole texts for example by Word Mover’s Distance. How to find piece inside one text that is similar to another text?

Upvotes: 0

Views: 159

Answers (1)

gojomo
gojomo

Reputation: 54183

You could break the text into chunks – ideally by natural groupings, like sentences or paragraphs – then do pairwise comparisons of every chunk against every other, using some text-distance measure.

Word Mover's Distance can give impressive results, but it quite slow/expensive to calculate, especially for large texts and large numbers of pairwise comparisons. Other more-simple summary vectors for text – such as a simple average of all the text's word-vectors, or a text-vector learned from the text like 'Paragraph Vector' (aka Doc2Vec) – will be much faster and might be good enough, or at least be a good quick 1st pass to limit the number of candidate pairs you do something more expensive on.

Upvotes: 1

Related Questions