user1034461
user1034461

Reputation:

Creating and using LuceneAnalysisDefinitionProvider with Hibernate Search

When you search Stackoverflow or the Internet for LuceneAnalysisDefinitionProvider, you'll find hundreds of pages, each of them having the same code copied from another page without any decent explanation or further examples of usage.

So I tried to do it by myself and failed. Here is my code:

public class CustomLuceneAnalysisDefinitionProvider
        implements LuceneAnalysisDefinitionProvider {

  @Override
  public void register(final LuceneAnalysisDefinitionRegistryBuilder builder) {
    builder
      .analyzer("customAnalyzer")
        .tokenizer(StandardTokenizerFactory.class)
        .charFilter(MappingCharFilterFactory.class)
          .param("mapping",
            "org/hibernate/search/test/analyzer/mapping-chars.properties")
        .tokenFilter(ASCIIFoldingFilterFactory.class)
        .tokenFilter(LowerCaseFilterFactory.class)
        .tokenFilter(StopFilterFactory.class)
          // WRONG! It's not "mapping"!
//        .param("mapping",
//          "org/hibernate/search/test/analyzer/stoplist.properties")
          .param("words",
            "classpath:/stoplist.properties")
          .param("ignoreCase", "true");
  }

}

Now we have CustomLuceneAnalysisDefinitionProvider and what's next?

  1. Where to put and how to address mapping-chars.properties when adding it as a parameter to MappingCharFilterFactory?
  2. What is the contents of mapping-chars.properties and how to create mine of modify existing?
  3. Where to put stoplist.properties and how to address it when adding as mapping parameter to StopFilterFactory?
  4. How to add previously defined customAnalyzer to single @Field mentioned below?
@Field(
    index = Index.YES,
    analyze = Analyze.YES,
    store = Store.NO,
    bridge = @FieldBridge(impl = LocalizedFieldBridge.class)
)
private LocalizedField description;

On some pages I found option to put this definition into application.properties:

hibernate.search.lucene.analysis_definition_provider = com.thevegcat.app.search.CustomAnalysisDefinitionProvider

But I don't want to replace original analyzer, I just want to use custom analyzer for few specific properties.


EDIT#1

Looking into org.apache.lucene.analysis.core.StopFilterFactory line 86, one can notice it takes words as a key, not mapping.


EDIT#2

If you put your stop words file in src/main/resources, then you have to address it:

.param("words", "classpath:/stoplist.properties")

Upvotes: 3

Views: 297

Answers (1)

yrodiere
yrodiere

Reputation: 9977

you'll find hundreds of pages, each of them having the same code copied from another page without any decent explanation or further examples of usage.

Hibernate Search 5 had its problems, one of which was lack of documentation in some areas. Now that it's in maintenance mode, those problems are unlikely to get addressed.

There is some documentation for that feature in the Hibernate Search 5 documentation: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#section-programmatic-analyzer-definition

You'll get better documentation of that feature by migrating to Hibernate Search 6+.

That being said, most of your questions related to Lucene features, so you probably won't find answers in Hibernate Search's documentation. You could find them in Lucene's documentation. How to find such documentation is explained in the Hibernate Search 6 documentation:

To know more about the behavior of these character filters, tokenizers and token filters, either browse the Lucene Javadoc or read the corresponding section on the Solr Wiki (you don’t need Solr to use these analyzers, it’s just that there is no documentation page for Lucene proper).


Where to put and how to address mapping-chars.properties when adding it as a parameter to MappingCharFilterFactory?

In your classpath.

What is the contents of mapping-chars.properties and how to create mine of modify existing?

That's the kind of things that Lucene doesn't document, at least not clearly. Solr's documentation is better: https://solr.apache.org/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory

Where to put stoplist.properties and how to address it when adding as mapping parameter to StopFilterFactory?

Put it in the classpath, and pass the path to that file from the root of your classpath.

How to add previously defined customAnalyzer to single @Field mentioned below?

Well that is documented, at least: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#_referencing_named_analyzers

@Field(analyzer = @Analyzer(definition = "customAnalyzer"))

On some pages I found option to put this definition into application.properties:

hibernate.search.lucene.analysis_definition_provider = com.thevegcat.app.search.CustomAnalysisDefinitionProvider

But I don't want to replace original analyzer, I just want to use custom analyzer for few specific properties.

You won't replace an "analyzer", you will register an analysis definition provider. Which will add analyzer definitions to Hibernate Search, which can then be referenced from @Field. Setting an analysis definition provider does not, in itself, change your mapping in any way.

Upvotes: 1

Related Questions