Hibernate search: Indexed data with Ngram filter and while searching it gives incorrect result due to tokenizing while querying

Question

I have an analyzer with this configuration,

searchMapping//
        .analyzerDef(BaseEntity.CUSTOM_SEARCH_INDEX_ANALYZER, WhitespaceTokenizerFactory.class)//
        .filter(LowerCaseFilterFactory.class)//
        .filter(ASCIIFoldingFilterFactory.class)//
        .filter(NGramFilterFactory.class).param("minGramSize", "1").param("maxGramSize", "200");

This is how my entity field is configured

@Field(analyzer = @Analyzer(definition = CUSTOM_SEARCH_INDEX_ANALYZER))
private String bookName;

This is how I create a search query

queryBuilder.keyword().onField(prefixedPath).matching(matchingString).createQuery()

I have an entity with value bookName="Gulliver" and another entity with bookName="xGulliver";

If I tried to search with data bookName = xG then am getting both entities where I would expect entity only with bookName="xGulliver"; Also looked on the query that is produced by hibernate-search.

Executing Lucene query '+(+(+(+( bookName:x bookName:xg bookName:g))))

Above Lucene query is prepared using BooleanJunction::must conditions by Lucene I guess which means it should match all the conditions. Still why its giving me both entity data. I dont understand here.

I can also override the analyzer while querying by having KeywordTokenizer instead of NGramFilterFactory but this is like I have to override for each and every field before creating QueryBuilder which doesnt looks good because then I have to override all index fields which I have about 100 fields and some are dynamic fields and I create individual query for each field.

Is there any other way to override the analyzer in 5.11 version or is it handled in some other way in hibernate-search 6.x version in easier way?

Hibernate versions that I use are,

hibernate-search-elasticsearch, hibernate-search-orm = 5.11.4.Final

Hibernate search: Indexed data with Ngram filter and while searching it gives incorrect result due to tokenizing while querying

Answers (1)

Related Questions