Reputation: 37034
I have following configration for hibernate-search:
@AnalyzerDef(name = "autocompleteNGramAnalyzer",
// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
// Normalize token text to lowercase, as the user is unlikely to
// care about casing when searching for matches
@TokenFilterDef(factory = WordDelimiterFilterFactory.class,
params = @Parameter(name = "catenateAll", value = "1")),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
@Parameter(name = "minGramSize", value = "2"),
@Parameter(name = "maxGramSize", value = "5")})})
The behaviour is really strange.
I have field with value George Cain
if I search by Ge
- it returns value
if I search by GeO
- it returns value
if I search by GeOR
- it doesn't returns anything
if I search by GeoR
- it returns value
if I search by GEOR
- it returns value
What bad with GeOR
?
How can I fix this?
Is it possible to debug this framework?
Upvotes: 0
Views: 559
Reputation: 37034
I customized WordDelimiterFilterFactory
and now this works:
@TokenFilterDef(factory = WordDelimiterFilterFactory.class,
params = {
@Parameter(name = "catenateAll", value = "1"),
@Parameter(name = "generateWordParts", value = "0")})//generateWordParts = 1 by default
Upvotes: 0
Reputation: 10519
First, try to use Luke to see what has been indexed in your Lucene index: https://github.com/DmitryKey/luke/releases . You will be able to see the tokens, which might help you to understand what is happening.
Be sure your analyzer is correctly defined on your field and the analyzer is applied to your query too (might be a good idea to show us how you defined your field and how you execute your query).
If you end up thinking it's a bug, you can use our https://github.com/hibernate/hibernate-test-case-templates/tree/master/search/hibernate-search-lucene to provide us a self contained test case so that we can take a look.
Upvotes: 2