Implement Lucene analysers with FieldBridges

Question

I want to implement lucene analysers in a fashion that works well with FieldBridges and manual searching. Ideally i want as little code duplication as possible.

I know most tutorials tells you to initialise your analysers with @AnalyzerDef annotation, and while i did this and got everything working, I was unable to make fields created in FieldBridges to respect the analysers. (Created with luceneoptions.addFieldToDocument).

I have tried to find another way of doing this but the documentation is sparse.

This is what i have come up with (for the sake of keeping this post short some code has been redacted, but i will post more if requested.):

Creating Analyser:

    public static org.apache.lucene.analysis.Analyzer getEnglishWordAnalyser() {
        org.apache.lucene.analysis.Analyzer analyser = null;

        try {
            analyser = CustomAnalyzer.builder()
                    .addCharFilter(HTMLStripCharFilterFactory.class)
                    .addCharFilter(MappingCharFilterFactory.class, getMappingSettings())
                    .withTokenizer(StandardTokenizerFactory.class)
                    .addTokenFilter(StandardFilterFactory.class)
                    .addTokenFilter(LowerCaseFilterFactory.class)
                    .addTokenFilter(SnowballPorterFilterFactory.class, getSnowballPorterSettings())
                    .addTokenFilter(SynonymFilterFactory.class, getSynonymSettings())
                    .addTokenFilter(ASCIIFoldingFilterFactory.class)
                    .addTokenFilter(PhoneticFilterFactory.class, getPhoneticSettings())
                    .addTokenFilter(StopFilterFactory.class, getStopSettings())
                    .build();

        } catch (IOException ex) {
            logger.info("[SearchConfig] [englishWordAnalyser] Failed to create components", ex);
        }

        return analyser;
    }

Creating Field With Analyser:

protected StringField createStringField(String name, String value, LuceneOptions luceneOptions) {
        final StringField field = new StringField(name, value, luceneOptions.getStore());

        final Analyzer analyzer = SearchConfig.getEnglishWordAnalyser();

        try {
            final TokenStream tokenStream = analyzer.tokenStream(name, new StringReader(value));
            tokenStream.reset();

            field.setBoost(luceneOptions.getBoost());
            field.setTokenStream(tokenStream);
            field.setStringValue(value);

            tokenStream.end();
            tokenStream.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

        analyzer.close();
        return field;
    }

Adding a new field from a FieldBridge:

createStringField("NAME", "VALUE", luceneOptions);

I also want to be able to use this Analyser when creating a MultiFieldQueryParser like this:

    final QueryParser parser = new MultiFieldQueryParser(getClassLuceneFields(clazz), getEnglishWordAnalyser());

Now i tested the analyser with MultiFieldQueryParser and it seems to work well, but when the FieldBridges are being indexed it craps out with this error:

java.lang.IllegalArgumentException: TokenStream fields must be indexed and tokenized

This is caused in createStringField on setTokenStream.

Does anyone have any ideas?

I could be completely going in the wrong direction, and if so, does anyone have any alternative that also suits my use-case.

Cheers

yrodiere · Accepted Answer

I'm affraid that's not how Lucene works. Lucene expects you to build documents with non-analyzed values in their fields, and it will take care of analyzing the documents when you put them in an index.

Hibernate Search takes care of setting up the proper configuration so that Lucene knows which analyzer to use for each field. It happens that this is easy to configure for standard @Field fields (@Field(analyzer = ...)), but not for fields added in field bridges.

Currently, the easiest solution would be the third one described in this blog post: analyzer discriminators. This is not the intended purpose of analyzer discriminators, but it will work.

Basically you will have to:

Define analyzers using @AnalyzerDef as usual

Create an analyzer discriminator that maps your fields to the corresponding analyzer definition:

public class MyDiscriminator implements Discriminator {
    public String getAnalyzerDefinitionName(Object value, Object entity, String fieldName) {
        switch ( fieldName ) {
        case "foo":
            return "analyzerNameForFieldFoo";
        case "bar":
            return "analyzerNameForFieldBar";
        default:
            return null; // Use the default analyzer
        }
    }
}

Apply the discriminator to your entity:

@Indexed
@Entity
@AnalyzerDiscriminator(impl = MyDiscriminator.class)
public class MyEntity {
   // ...
}

See here for more documentation about analyzer discriminators.

Implement Lucene analysers with FieldBridges

Answers (1)

Related Questions