Reputation: 692
I am trying to devise a duplication catching scheme for my Lucene.net app. My problem is it is hard to make a unique key since many of the fields will be the same. The only fields that I know will be different are the Title and Abstract fields. The problem with making a key from this is someone could slightly change the title and it would be viewed as unique also. Basically I am looking for a way of having a threshold where if they 95% match it is a match. Is there a way of doing this with Lucene?
Upvotes: 1
Views: 177
Reputation: 21264
I'm unclear about your requirement for a unique key, but you can check out Lucene's FuzzyQuery for matching on similar terms. Check out these articles: Fuzzy Searches, FuzzyQuery.
Upvotes: 1