Reputation: 469
In Lucene, I can use fuzzy search to get 'similar' results.
For example, following query:
text:awesome~0.8
Will find the documents having 80% similar texts, like 'awesom'.
My question is, can I use fuzzy search on entire text (multiple words)?
For example, I want to find out 80% similar texts to following text:
this is my text with multiple words
Putting fuzzy clause on each word would not give me desired results:
text:(+this~0.8 +is~0.8 +my~0.8 +text~0.8 +with~0.8 +multiple~0.8 +words~0.8)
As it would return only those documents which has all the words (or 80% similar words against each word) specified in query.
I expect query to return me results where entire string is 80% similar (even if it doesn't have an entire word), for example:
this is text with multiple words
Something like this -
text:(+this +is +my +text +with +multiple +words)~0.8
Obviously above query gives syntax error, but I need to get results based on similarity on entire text/phrase.
I am happy to use Java API classes for this purpose as I need to use it in a Java program.
Upvotes: 1
Views: 1547
Reputation: 1076
I am not sure that floating similarity for fuzzy query is allowed anymore in Lucene. From lucene-4.0 and later versions, FuzzyQuery supports maximum 2 edit distance.
Let's assume you want edit distance of 2. You can use Keyword Analyser while indexing your field. This will not tokenize your field values. While searching you can use FuzzyQuery with term containing full text.
Limitations of this solution:
Upvotes: 1