Reputation: 2339
I have hibernate search endpoint where I needed to return the closest match in a group of words. when I try to make a search the closest words is not found in the first 10 results, below is the snippet of the hibernate search
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
QueryBuilder qb = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Test.class).get();
org.apache.lucene.search.Query luceneQuery = qb.keyword().onFields("arg")
.matching(searchTerm).createQuery();
javax.persistence.Query jpaQuery = fullTextEntityManager.createFullTextQuery(luceneQuery, Test.class);
Please how can I return the closest match of group of words
Upvotes: 1
Views: 168
Reputation: 9977
While full-text search can return "close matches" (i.e. to account for typos, etc.), you still need to opt in.
For approximate matches, you have two solutions:
If you go with solution #2, I suggest you have a look at these resources to familiarize yourself with full-text search:
(This is the documentation of Hibernate Search 6, but the concepts are the same as in Hibernate Search 5)
Then have a look at how to configure an analyzer in Hibernate Search 5.
Now you should have a better idea of what analyzers are: the transform the text, both when indexing and querying, into tokens that will be matched exactly. The approximate matches are achieved by an approximate transformation: if analysis transforms "Résumé" into "resume", then the query "resume" will match a document containing "Résumé".
For example:
Document: "Quick Brown Fox" => "quick", "brown", "fox"
Queried: "Qick borwn fox" => "qick", "borwn", "fox"
Matching: "fox"
There's a typo in the query. The document should be high in the search hits, but it won't be because only one term matches, "fox".
To get even more approximate matches, one strategy is to break down words into what is called "ngrams". To that end, use NGramFilterFactory
, like here for example.
If we set up analysis to break down words into 3-grams, we will get this:
Document: "quick brown fox" => "qui", "uic", "ick", "bro", "row", "own", "fox"
Queried: "qick borwn fox" => "qic", "ick", "bor", "orw", "rwn", "fox"
Matching: "ick", "fox"
Now it's a little better: two terms will match, "ick" and "fox". The document will be higher up in the result list.
Of course, it's not perfect either:
As you can see, getting a full-text search that behaves just the way you want requires some work and configuration; there's no "one-size-fits-all" solution. You just need to try different configurations and see what suits you best.
Upvotes: 1