Customize EdgeNGramFilter minGramSize and maxGramSize in Hibernate Search 6.1.8 Final with Lucene backend

Question

I am trying to implement autocomplete inspired by the Search analyzer section in this Hibernate Search 6.0.0.Beta2 release

This is the example from the above link that I am trying to follow.

@Entity
@Indexed
public class Book {

    @Id
    private Long id;

    @FullTextField(
            name = "title_autocomplete",
            analyzer = "autocomplete",
            searchAnalyzer = "autocomplete_query"
    )
    private String title;

    // ... getters and setters ...
}

To define an analyzer named "autocomplete" and a search analyzer named "autocomplete_query", I followed the 10.6.4 Custom analyzers and normalizers and defined the the following custom lucene analysis configurer and create a new persistence.xml.

public class CustomLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {

    @Override
    public void configure(LuceneAnalysisConfigurationContext context) {
        context.analyzer("autocomplete").custom()
            .tokenizer(StandardTokenizerFactory.class)
            .charFilter(HTMLStripCharFilterFactory.class)
            .tokenFilter(LowerCaseFilterFactory.class)
            .param("language", "English")
            .tokenFilter( ASCIIFoldingFilterFactory.class)
            .tokenFilter(EdgeNGramFilterFactory.class);

        context.analyzer("autocomplete_query").custom()
            .tokenizer(StandardTokenizerFactory.class)
            .charFilter(HTMLStripCharFilterFactory.class)
            .tokenFilter(LowerCaseFilterFactory.class)
            .param("language", "English")
            .tokenFilter(ASCIIFoldingFilterFactory.class);
    }
}

My question is : is there a way to set the minGramSize and maxGramSize using the above method? I've gone through the official documentation but found no information on how to do this.

mark_o · Accepted Answer

This can be done similarly to how you have the language parameter specified for lower case filter. tokenFilter() returns a DSL step exposing a parameter method through which you can pass any filter-related parameters:

public class CustomLuceneAnalysisConfigurer implements LuceneAnalysisConfigurer {

    @Override
    public void configure(LuceneAnalysisConfigurationContext context) {
        context.analyzer("autocomplete").custom()
            .tokenizer(StandardTokenizerFactory.class)
            .charFilter(HTMLStripCharFilterFactory.class)
            .tokenFilter(LowerCaseFilterFactory.class)
                    .param("language", "English")
            .tokenFilter( ASCIIFoldingFilterFactory.class)
            .tokenFilter( EdgeNGramFilterFactory.class )
                    .param( "minGramSize", "3" )
                    .param( "maxGramSize", "7" );

        context.analyzer("autocomplete_query").custom()
            .tokenizer(StandardTokenizerFactory.class)
            .charFilter(HTMLStripCharFilterFactory.class)
            .tokenFilter(LowerCaseFilterFactory.class)
                   .param("language", "English")
            .tokenFilter(ASCIIFoldingFilterFactory.class);
    }
}

In case you are unsure about parameter name strings - open a filter class implementation and look for a constructor accepting a map - it will have the parameter names in it.

Customize EdgeNGramFilter minGramSize and maxGramSize in Hibernate Search 6.1.8 Final with Lucene backend

Answers (1)

Related Questions