David Mason
David Mason

Reputation: 2957

Using an Analyzer within a custom FieldBridge

I have a List getter method that I want to index (tokenized) into a number of fields.

I have a FieldBridge implementation that iterates over the list and indexes each string into a field with the index appended to the field name to give a different name for each.

I have two different Analyzer implementations (CaseSensitiveNGramAnalyzer and CaseInsensitiveNGramAnalyzer) that I want to use with this FieldBridge (to make a case-sensitive and a case-insensitive index of the field).

This is the FieldBridge I want to apply the Analyzers to:

public class StringListBridge implements FieldBridge
{

   @Override
   public void set(String name, Object value, Document luceneDocument, LuceneOptions luceneOptions)
   {
      List<String> strings = (List<String>) value;
      for (int i = 0; i < strings.size(); i++)
      {
         addStringField(name + 1, strings.get(i), luceneDocument, luceneOptions);
      }
   }

   private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
   {
      Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
      field.setBoost(luceneOptions.getBoost());
      luceneDocument.add(field);
   }
}

I am thinking along the lines of the following, but am not at all familiar with field token streams etc.:

   private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
   {
      Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
      field.setBoost(luceneOptions.getBoost());
      try
      {
         field.setTokenStream(new CaseSensitiveNGramAnalyzer().reusableTokenStream(fieldName, new StringReader(fieldValue)));
      }
      catch (IOException e)
      {
         e.printStackTrace();
      }
      luceneDocument.add(field);
   }

Is this a sane approach?

EDIT I have tried specifying the Analyzer and FieldBridge within a @Field annotation (without including the above analyzer code) as follows, but it appears to be using the default analyzer rather than those specified with analyzer = .

   @Fields({
      @Field(name="content-nocase",
             index = Index.TOKENIZED,
             analyzer = @Analyzer(impl = CaseInsensitiveNgramAnalyzer.class),
             bridge = @FieldBridge(impl = StringListBridge.class)),
      @Field(name = "content-case",
             index = Index.TOKENIZED,
             analyzer = @Analyzer(impl = CaseSensitiveNgramAnalyzer.class),
             bridge = @FieldBridge(impl = StringListBridge.class)),
   })
   public List<String> getContents()

Upvotes: 2

Views: 4143

Answers (2)

Hardy
Hardy

Reputation: 19129

The solution atm is via a custom scoped analyzer or using @AnalyzerDiscriminator together with @AnalyzerDef. This is also discussed on the Hibernate Search forum - https://forum.hibernate.org/viewtopic.php?f=9&t=1016667

Upvotes: 3

David Mason
David Mason

Reputation: 2957

I managed to get this working. Hibernate Search appears not to use the specified Analyzer when both analyzer = and bridge = are specified, at least if the specified bridge creates multiple fields.

Manually passing the TokenStream from the desired analyzer to the generated Fields in the bridge got me the expected result:

   private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
   {
      Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
      field.setBoost(luceneOptions.getBoost());

      // manually apply token stream from analyzer, as hibernate search does not
      // apply the specified analyzer properly
      try
      {
         field.setTokenStream(analyzer.reusableTokenStream(fieldName, new StringReader(fieldValue)));
      }
      catch (IOException e)
      {
         e.printStackTrace();
      }
      luceneDocument.add(field);
   }

ParameterizedBridge is implemented to specify which analyzer to use (analyzer is instantiated and stored in a field before this method is called).

Upvotes: 2

Related Questions