Reputation: 67
I want to utilize a custom Analyzer like default. Searching online on the Hibernate search ' s documentation, i saw that is possibile changing it in the Hibernate configuration. In particular with the property "hibernate.search.analyzer". Then I made this property
<property name="hibernate.search.analyzer">Class of Analyzer </property>
My question is: How I can create a analyzer' s class for pass it at the property? In particolary I want use the "EdgeNGram" , I tried to pass the EngedNgram' s tokenizer factory , but it not works .
<property name="hibernate.search.analyzer">EdgeNGramTokenizerFactory.class</property>
Can you show me a Example of the class that I can pass at this property ? Thanks
Upvotes: 0
Views: 1634
Reputation: 9977
EDIT: Hibernate Search 6+ users, what follows is mostly irrelevant to you. Read this section of the documentation instead.
First, let me warn you that the default analyzer should generally be a general-purpose one, one that may not be great, but is good enough for most fields. As new requirements are added to your application, you are very unlikely to be able to use the same analyzer everywhere, and will ultimately have to use specific analyzers in at least some of your index fields. That's why I personally prefer to use org.apache.lucene.analysis.core.KeywordAnalyzer
as a default, and specify an analyzer wherever I need one.
EDIT: With Hibernate Search 6 this advice (using a keyword analyzer by default) has become less relevant, since keyword fields and full-text fields are clearly separated. Still, that's good advice for Hibernate Search 5.
Now, you've been warned: using an EdgeNGramTokenizerFactory
for your default analyzer is probably a bad idea. If you still want to do it, read on...
The default analyzer doesn't have to be a class. It can be the fully qualified name of a class, but here you want a custom analyzer, and writing your own analyzer class can be complex, so if you're not used to Lucene I wouldn't recommend it.
What you can do instead is use the name of a named analyzer defined using an @AnalyzerDef
annotation or an analysis definition provider. These definitions use "off the shelf" analysis components and assemble them to a fully-fledged analyzer, which is much easier to do.
So, for example, you can define this class, which is not an analyzer class, but rather a class that provides analyzer definitions:
package com.acme.search;
import org.apache.lucene.analysis.core.ASCIIFoldingFilterFactory;
import org.apache.lucene.analysis.core.LowerCaseFilterFactory;
import org.apache.lucene.analysis.ngram.EdgeNGramTokenizerFactory;
import org.hibernate.search.analyzer.definition.LuceneAnalysisDefinitionProvider;
import org.hibernate.search.analyzer.definition.LuceneAnalysisDefinitionRegistryBuilder;
public class CustomAnalyzerProvider implements LuceneAnalysisDefinitionProvider {
@Override
public void register(LuceneAnalyzerDefinitionRegistryBuilder builder) {
builder
.analyzer( "myAnalyzer" )
.tokenizer( EdgeNGramTokenizerFactory.class )
.param( "minGramSize" "1" )
.param( "maxGramSize", "5" )
.tokenFilter( ASCIIFoldingFilterFactory.class )
.tokenFilter( LowerCaseFilterFactory.class );
}
}
Then define the following properties in your persistence.xml
:
<property name="hibernate.search.lucene.analysis_definition_provider">com.acme.search.CustomAnalyzerProvider</property>
<property name="hibernate.search.analyzer">myAnalyzer</property>
And you should be good to go.
EDIT: If you use the Elasticsearch integration, then 1) using a custom Lucene Analyzer class will never work and 2) you need to do this to defined named analyzers instead:
Define this class, which is not an analyzer class, but rather a class that provides analyzer definitions:
package com.acme.search;
import org.hibernate.search.elasticsearch.analyzer.definition.ElasticsearchAnalysisDefinitionProvider;
import org.hibernate.search.elasticsearch.analyzer.definition.ElasticsearchAnalysisDefinitionRegistryBuilder;
public class CustomAnalyzerProvider implements ElasticsearchAnalysisDefinitionProvider {
@Override
public void register(initionRegistryBuilder builder) {
builder.analyzer( "myAnalyzer" )
.withTokenizer( "myEdgeNgram" )
.withCharFilters( "asciifolding" )
.withTokenFilters( "lowercase" );
builder.tokenizer( "myEdgeNgram" )
.type( "edge_ngram" )
.param( "min_gram", "1" )
.param( "max_gram", "5" );
}
}
Then define the following properties in your persistence.xml
(note the properties are different from my example with Lucene):
<property name="hibernate.search.elasticsearch.analysis_definition_provider">com.acme.search.CustomAnalyzerProvider</property>
<property name="hibernate.search.analyzer">myAnalyzer</property>
I you need more information, the documentation might help.
Upvotes: 5