Using an Analyzer within a custom FieldBridge

Question

I have a List getter method that I want to index (tokenized) into a number of fields.

I have a FieldBridge implementation that iterates over the list and indexes each string into a field with the index appended to the field name to give a different name for each.

I have two different Analyzer implementations (CaseSensitiveNGramAnalyzer and CaseInsensitiveNGramAnalyzer) that I want to use with this FieldBridge (to make a case-sensitive and a case-insensitive index of the field).

This is the FieldBridge I want to apply the Analyzers to:

public class StringListBridge implements FieldBridge
{

   @Override
   public void set(String name, Object value, Document luceneDocument, LuceneOptions luceneOptions)
   {
      List strings = (List) value;
      for (int i = 0; i < strings.size(); i++)
      {
         addStringField(name + 1, strings.get(i), luceneDocument, luceneOptions);
      }
   }

   private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
   {
      Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
      field.setBoost(luceneOptions.getBoost());
      luceneDocument.add(field);
   }
}

Is it possible to apply an Analyzer to a field that uses a FieldBridge?
If so, can this be done with annotations, or does it have to be done programatically?
If the latter, can I inject the Analyzer as a parameter?

I am thinking along the lines of the following, but am not at all familiar with field token streams etc.:

   private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
   {
      Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
      field.setBoost(luceneOptions.getBoost());
      try
      {
         field.setTokenStream(new CaseSensitiveNGramAnalyzer().reusableTokenStream(fieldName, new StringReader(fieldValue)));
      }
      catch (IOException e)
      {
         e.printStackTrace();
      }
      luceneDocument.add(field);
   }

Is this a sane approach?

EDIT I have tried specifying the Analyzer and FieldBridge within a @Field annotation (without including the above analyzer code) as follows, but it appears to be using the default analyzer rather than those specified with analyzer = .

   @Fields({
      @Field(name="content-nocase",
             index = Index.TOKENIZED,
             analyzer = @Analyzer(impl = CaseInsensitiveNgramAnalyzer.class),
             bridge = @FieldBridge(impl = StringListBridge.class)),
      @Field(name = "content-case",
             index = Index.TOKENIZED,
             analyzer = @Analyzer(impl = CaseSensitiveNgramAnalyzer.class),
             bridge = @FieldBridge(impl = StringListBridge.class)),
   })
   public List getContents()

David Mason · Accepted Answer

I managed to get this working. Hibernate Search appears not to use the specified Analyzer when both analyzer = and bridge = are specified, at least if the specified bridge creates multiple fields.

Manually passing the TokenStream from the desired analyzer to the generated Fields in the bridge got me the expected result:

   private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
   {
      Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
      field.setBoost(luceneOptions.getBoost());

      // manually apply token stream from analyzer, as hibernate search does not
      // apply the specified analyzer properly
      try
      {
         field.setTokenStream(analyzer.reusableTokenStream(fieldName, new StringReader(fieldValue)));
      }
      catch (IOException e)
      {
         e.printStackTrace();
      }
      luceneDocument.add(field);
   }

ParameterizedBridge is implemented to specify which analyzer to use (analyzer is instantiated and stored in a field before this method is called).

Using an Analyzer within a custom FieldBridge

Answers (2)

Related Questions