Hibernate Search multiple fields with @ClassBridge

Question

First of all, Happy New Year !

I'd like to index entity label in multiple languages.

I have 2 entities :

MyEntity

labelCode

Translation

code
languageCode
label

The MyEntity.labelCode must match with Translation.code then I have multiple labels for multiple languages per MyEntity instance.

I wrote a ClassBridge on MyEntity to add multiple fields to document :

class I18NTranslationClassBridge implements FieldBridge {

Analyzer analyzer

@Override
void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
    if (value && value instanceof I18NDictionaryCategory) {
        I18NDictionaryCategory entity = value as I18NDictionaryCategory

        String labelCode = entity.getLabelCode()
        def translations = TranslationData.findAllByCode(labelCode)
        if (!analyzer) analyzer = Search.getFullTextSession(Holders.getApplicationContext().sessionFactory.currentSession).getSearchFactory().getAnalyzer('wildcardAnalyzer')
        translations?.each { translation ->
            document.add(getStringField("labelCode_${translation.languageCode}", translation.label, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO, 1f, analyzer))
            document.add(getStringField("labelCode__${translation.languageCode}_full", translation.label, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS, Field.TermVector.NO, 1f, null))
        }

    }
}

private static Field getStringField(String fieldName, String fieldValue, Field.Store store, Field.Index index, Field.TermVector termVector, float boost, Analyzer analyzer) {
    Field field = new Field(fieldName, fieldValue, store, index, termVector);
    field.setBoost(boost);
    // manually apply token stream from analyzer, as hibernate search does not
    // apply the specified analyzer properly
    if (analyzer) {
        try {
            field.setTokenStream(analyzer.reusableTokenStream(fieldName, new StringReader(fieldValue)));
        }
        catch (IOException e) {
            e.printStackTrace();
        }
    }
    return field
}

}

I'd like to index 2 fields per language : 1 with no analyzer and no tokenizer (for sorting results) and an other with tokenizer (for full-text search).

My problem is that all fields without analyzer are well indexed but fields with analyzer are not. Only 1 language is correctly indexed.

I try to do it with ClassBridge or FieldBridge without success.

Any suggest ?

Best regards,

Léo

Hardy · Accepted Answer

You should not use an Analyzer within class/field bridge. Analyzers are applied at a later stage. Hibernate Search collects all required analyzers in a so called ScopedAnalyzer which gets used when the Lucene Document gets added to the index. To support your use case you can make use of the dynamic analyzer selection feature. See also http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#d0e4119.

The basic approach is to define the language specific analyzers via @AnalyzerDiscriminator. This makes them globally available by name. Then you need to implement org.hibernate.search.analyzer.Decriminator. You basically return the right analyzer name depending on your field name (assuming that the field names contain in some form the language code). Last but not least you need to annotate MyEntity with @AnalyzerDiscriminator(impl = MyDiscriminator.class).

Hibernate Search multiple fields with @ClassBridge

Answers (1)

Related Questions