Reputation: 41
I need to get the score of the similarity between texts, when one is inside the second.
For example:
Text1: aaa bbb ccc ddd eee
Text2: bbb ccc
I need somethig what say me, that Text2 is for 100% inside the Text1. Is there some way to do this?
Upvotes: 4
Views: 629
Reputation: 20621
Please see the book Mining of Massive Datasets and Dekang Lin's definition of similarity (PDF). Both do not require Lucene.
Upvotes: 0
Reputation: 8553
You don't Lucene to obtain similarity between texts.There are several measures available depending on the text length, type of strings etc. and you will need to experiment which gives you the best results.
A pretty good and comprehensive collection of algorithms is available at SimMetrics is an F/OSS library that offers an extensive collection of similarity algorithms and their corresponding cost functions.
Upvotes: 0
Reputation: 39207
Depending on what you want you may try
Both will give you 1 if the text is completely inside text1 and 0 if they do not share a common character.
Upvotes: 1