Reputation: 995
I'm trying to convert 15.6"
searches to 15.6 inch
. The idea was first replace 15.6"
to 15.6 "
and then match the "
with the synonym rule " => inch
.
I created the type definition:
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern='^([0-9]+([,.][0-9]+)?)(")$' replacement="$1 $3" />
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" />
</analyzer>
</fieldType>
but it's not working! If I input 15.6"
I get 15.6 "
, but when I input 15.6 "
I get what I want - 15.6 inch
.
Why doesn't it work? Am I missing something?
EDIT:
Upvotes: 1
Views: 658
Reputation: 52792
The issue is that 15.6 "
is still a single token after your pattern replace filter - just creating a token with a space in it will not split it.
You can see that it's still kept as a single token as there is no |
on the line (which separates the tokens).
Add a Word Delimiter Filter after it (it seems from your analysis chain that you already have one, it's just not included in your question), or, better, do the replacement in a PatternReplaceCharFilterFactory
before the tokenizer gets the task to split the input into separate tokens:
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern='^([0-9]+([,.][0-9]+)?)(")$' replacement="$1 $3" />
<tokenizer ...>
You might have to massage the pattern matcher a bit (i.e. lose the ^ and $ which isn't respected by Solr any way, iirc) depending on your input (since it'll now be applied to the whole input string - make sure that "Macbook 15.6" 256GB" is matched approriately).
Upvotes: 2