Reputation: 2957
I have a List getter method that I want to index (tokenized) into a number of fields.
I have a FieldBridge implementation that iterates over the list and indexes each string into a field with the index appended to the field name to give a different name for each.
I have two different Analyzer implementations (CaseSensitiveNGramAnalyzer and CaseInsensitiveNGramAnalyzer) that I want to use with this FieldBridge (to make a case-sensitive and a case-insensitive index of the field).
This is the FieldBridge I want to apply the Analyzers to:
public class StringListBridge implements FieldBridge
{
@Override
public void set(String name, Object value, Document luceneDocument, LuceneOptions luceneOptions)
{
List<String> strings = (List<String>) value;
for (int i = 0; i < strings.size(); i++)
{
addStringField(name + 1, strings.get(i), luceneDocument, luceneOptions);
}
}
private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
{
Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
field.setBoost(luceneOptions.getBoost());
luceneDocument.add(field);
}
}
I am thinking along the lines of the following, but am not at all familiar with field token streams etc.:
private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
{
Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
field.setBoost(luceneOptions.getBoost());
try
{
field.setTokenStream(new CaseSensitiveNGramAnalyzer().reusableTokenStream(fieldName, new StringReader(fieldValue)));
}
catch (IOException e)
{
e.printStackTrace();
}
luceneDocument.add(field);
}
Is this a sane approach?
EDIT I have tried specifying the Analyzer and FieldBridge within a @Field annotation (without including the above analyzer code) as follows, but it appears to be using the default analyzer rather than those specified with analyzer =
.
@Fields({
@Field(name="content-nocase",
index = Index.TOKENIZED,
analyzer = @Analyzer(impl = CaseInsensitiveNgramAnalyzer.class),
bridge = @FieldBridge(impl = StringListBridge.class)),
@Field(name = "content-case",
index = Index.TOKENIZED,
analyzer = @Analyzer(impl = CaseSensitiveNgramAnalyzer.class),
bridge = @FieldBridge(impl = StringListBridge.class)),
})
public List<String> getContents()
Upvotes: 2
Views: 4143
Reputation: 19129
The solution atm is via a custom scoped analyzer or using @AnalyzerDiscriminator together with @AnalyzerDef. This is also discussed on the Hibernate Search forum - https://forum.hibernate.org/viewtopic.php?f=9&t=1016667
Upvotes: 3
Reputation: 2957
I managed to get this working. Hibernate Search appears not to use the specified Analyzer when both analyzer =
and bridge =
are specified, at least if the specified bridge creates multiple fields.
Manually passing the TokenStream from the desired analyzer to the generated Fields in the bridge got me the expected result:
private void addStringField(String fieldName, String fieldValue, Document luceneDocument, LuceneOptions luceneOptions)
{
Field field = new Field(fieldName, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
field.setBoost(luceneOptions.getBoost());
// manually apply token stream from analyzer, as hibernate search does not
// apply the specified analyzer properly
try
{
field.setTokenStream(analyzer.reusableTokenStream(fieldName, new StringReader(fieldValue)));
}
catch (IOException e)
{
e.printStackTrace();
}
luceneDocument.add(field);
}
ParameterizedBridge
is implemented to specify which analyzer to use (analyzer
is instantiated and stored in a field before this method is called).
Upvotes: 2