Reputation: 105
We know how to make an assessment of the similarity of two whole texts for example by Word Mover’s Distance. How to find piece inside one text that is similar to another text?
Upvotes: 0
Views: 159
Reputation: 54183
You could break the text into chunks – ideally by natural groupings, like sentences or paragraphs – then do pairwise comparisons of every chunk against every other, using some text-distance measure.
Word Mover's Distance can give impressive results, but it quite slow/expensive to calculate, especially for large texts and large numbers of pairwise comparisons. Other more-simple summary vectors for text – such as a simple average of all the text's word-vectors, or a text-vector learned from the text like 'Paragraph Vector' (aka Doc2Vec
) – will be much faster and might be good enough, or at least be a good quick 1st pass to limit the number of candidate pairs you do something more expensive on.
Upvotes: 1