Reputation: 41
I have an analyzer with this configuration,
searchMapping//
.analyzerDef(BaseEntity.CUSTOM_SEARCH_INDEX_ANALYZER, WhitespaceTokenizerFactory.class)//
.filter(LowerCaseFilterFactory.class)//
.filter(ASCIIFoldingFilterFactory.class)//
.filter(NGramFilterFactory.class).param("minGramSize", "1").param("maxGramSize", "200");
This is how my entity field is configured
@Field(analyzer = @Analyzer(definition = CUSTOM_SEARCH_INDEX_ANALYZER))
private String bookName;
This is how I create a search query
queryBuilder.keyword().onField(prefixedPath).matching(matchingString).createQuery()
I have an entity with value bookName="Gulliver" and another entity with bookName="xGulliver";
If I tried to search with data bookName = xG
then am getting both entities where I would expect entity only with bookName="xGulliver"
;
Also looked on the query that is produced by hibernate-search.
Executing Lucene query '+(+(+(+( bookName:x bookName:xg bookName:g))))
Above Lucene query is prepared using BooleanJunction::must
conditions by Lucene I guess which means it should match all the conditions.
Still why its giving me both entity data. I dont understand here.
I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.
Is there any other way to override the analyzer in 5.11 version or is it handled in some other way in hibernate-search 6.x version in easier way?
Hibernate versions that I use are,
hibernate-search-elasticsearch, hibernate-search-orm = 5.11.4.Final
Upvotes: 0
Views: 781
Reputation: 9977
Above Lucene query is prepared using BooleanJunction::must conditions by Lucene I guess which means it should match all the conditions. Still why its giving me both entity data. I dont understand here.
When you create a keyword
query using Hibernate Search, the string passed to that query is analyzed, and if there are multiple tokens, Hibernate Search creates a boolean query with one "should" clause for each token. You can see it here " bookName:x bookName:xg bookName:g": there is no "+" sign before "bookName", which means those are not "must" clauses, they are "should" clauses.
I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.
True, that's annoying.
Is there any other way to override the analyzer in 5.11 version
In 5.11, I don't think there is any other way to override analyzers.
If necessary and if you're using the Lucene backend, I believe you should be able to bypass the Hibernate Search DSL just for this specific query:
Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("myAnalyzerWithoutNGramTokenFilter")
.analyzer.tokenStream(...)
and use the TokenStream
as appropriate. You'll get a list of tokens.Query
: essentially it will be a boolean query with one TermQuery
for each token.Query
to Hibernate Search as usual.or is it handled in some other way in hibernate-search 6.x version in easier way?
It's dead simple in Hibernate Search 6 and later. There are two solutions:
@FullTextField(analyzer = "myAnalyzer")
), but also a "search" analyzer using @FullTextField(analyzer = "myAnalyzer", searchAnalyzer = "mySearchAnalyzer")
. The "default" analyzer will be used when indexing, while the "search" analyzer will be used when searching (querying)..analyzer("mySearchAnalyzer")
while building the predicate. There is one example in this section of the documentation.Note however that dynamic fields are not supported yet in Hibernate Search 6: HSEARCH-3273.
Upvotes: 2