Christian Scarselli
Christian Scarselli

Reputation: 67

Hibernate Search , changes the default Analyzer

I want to utilize a custom Analyzer like default. Searching online on the Hibernate search ' s documentation, i saw that is possibile changing it in the Hibernate configuration. In particular with the property "hibernate.search.analyzer". Then I made this property

<property name="hibernate.search.analyzer">Class of Analyzer </property>

My question is: How I can create a analyzer' s class for pass it at the property? In particolary I want use the "EdgeNGram" , I tried to pass the EngedNgram' s tokenizer factory , but it not works .

<property name="hibernate.search.analyzer">EdgeNGramTokenizerFactory.class</property>

Can you show me a Example of the class that I can pass at this property ? Thanks

Upvotes: 0

Views: 1634

Answers (1)

yrodiere
yrodiere

Reputation: 9977

EDIT: Hibernate Search 6+ users, what follows is mostly irrelevant to you. Read this section of the documentation instead.

First, let me warn you that the default analyzer should generally be a general-purpose one, one that may not be great, but is good enough for most fields. As new requirements are added to your application, you are very unlikely to be able to use the same analyzer everywhere, and will ultimately have to use specific analyzers in at least some of your index fields. That's why I personally prefer to use org.apache.lucene.analysis.core.KeywordAnalyzer as a default, and specify an analyzer wherever I need one.

EDIT: With Hibernate Search 6 this advice (using a keyword analyzer by default) has become less relevant, since keyword fields and full-text fields are clearly separated. Still, that's good advice for Hibernate Search 5.

Now, you've been warned: using an EdgeNGramTokenizerFactory for your default analyzer is probably a bad idea. If you still want to do it, read on...

The default analyzer doesn't have to be a class. It can be the fully qualified name of a class, but here you want a custom analyzer, and writing your own analyzer class can be complex, so if you're not used to Lucene I wouldn't recommend it.

What you can do instead is use the name of a named analyzer defined using an @AnalyzerDef annotation or an analysis definition provider. These definitions use "off the shelf" analysis components and assemble them to a fully-fledged analyzer, which is much easier to do.

So, for example, you can define this class, which is not an analyzer class, but rather a class that provides analyzer definitions:

package com.acme.search;

import org.apache.lucene.analysis.core.ASCIIFoldingFilterFactory;
import org.apache.lucene.analysis.core.LowerCaseFilterFactory;
import org.apache.lucene.analysis.ngram.EdgeNGramTokenizerFactory;
import org.hibernate.search.analyzer.definition.LuceneAnalysisDefinitionProvider;
import org.hibernate.search.analyzer.definition.LuceneAnalysisDefinitionRegistryBuilder;

public class CustomAnalyzerProvider implements LuceneAnalysisDefinitionProvider {
    @Override
    public void register(LuceneAnalyzerDefinitionRegistryBuilder builder) {
        builder
                .analyzer( "myAnalyzer" )
                        .tokenizer( EdgeNGramTokenizerFactory.class )
                                .param( "minGramSize" "1" )
                                .param( "maxGramSize", "5" )
                        .tokenFilter( ASCIIFoldingFilterFactory.class )
                        .tokenFilter( LowerCaseFilterFactory.class );
    }
}

Then define the following properties in your persistence.xml:

<property name="hibernate.search.lucene.analysis_definition_provider">com.acme.search.CustomAnalyzerProvider</property>
<property name="hibernate.search.analyzer">myAnalyzer</property>

And you should be good to go.

EDIT: If you use the Elasticsearch integration, then 1) using a custom Lucene Analyzer class will never work and 2) you need to do this to defined named analyzers instead:

Define this class, which is not an analyzer class, but rather a class that provides analyzer definitions:

package com.acme.search;

import org.hibernate.search.elasticsearch.analyzer.definition.ElasticsearchAnalysisDefinitionProvider;
import org.hibernate.search.elasticsearch.analyzer.definition.ElasticsearchAnalysisDefinitionRegistryBuilder;

public class CustomAnalyzerProvider implements ElasticsearchAnalysisDefinitionProvider {
    @Override
    public void register(initionRegistryBuilder builder) {
        builder.analyzer( "myAnalyzer" )
                .withTokenizer( "myEdgeNgram" )
                .withCharFilters( "asciifolding" )
                .withTokenFilters( "lowercase" );

        builder.tokenizer( "myEdgeNgram" )
                .type( "edge_ngram" )
                .param( "min_gram", "1" )
                .param( "max_gram", "5" );
    }
}

Then define the following properties in your persistence.xml (note the properties are different from my example with Lucene):

<property name="hibernate.search.elasticsearch.analysis_definition_provider">com.acme.search.CustomAnalyzerProvider</property>
<property name="hibernate.search.analyzer">myAnalyzer</property>

I you need more information, the documentation might help.

Upvotes: 5

Related Questions