Maputo
Maputo

Reputation: 995

Solr synonym graph filter not working after other filter

I'm trying to convert 15.6" searches to 15.6 inch. The idea was first replace 15.6" to 15.6 " and then match the " with the synonym rule " => inch. I created the type definition:

<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.PatternReplaceFilterFactory" pattern='^([0-9]+([,.][0-9]+)?)(")$' replacement="$1 $3" />
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" />
    </analyzer>
</fieldType>

but it's not working! If I input 15.6" I get 15.6 ", but when I input 15.6 " I get what I want - 15.6 inch.

Why doesn't it work? Am I missing something?

EDIT:

Solr Analysis: Not working

Working

Upvotes: 1

Views: 658

Answers (1)

MatsLindh
MatsLindh

Reputation: 52792

The issue is that 15.6 " is still a single token after your pattern replace filter - just creating a token with a space in it will not split it.

You can see that it's still kept as a single token as there is no | on the line (which separates the tokens).

Add a Word Delimiter Filter after it (it seems from your analysis chain that you already have one, it's just not included in your question), or, better, do the replacement in a PatternReplaceCharFilterFactory before the tokenizer gets the task to split the input into separate tokens:

<analyzer>
  <charFilter class="solr.PatternReplaceCharFilterFactory" pattern='^([0-9]+([,.][0-9]+)?)(")$' replacement="$1 $3" />
  <tokenizer ...>

You might have to massage the pattern matcher a bit (i.e. lose the ^ and $ which isn't respected by Solr any way, iirc) depending on your input (since it'll now be applied to the whole input string - make sure that "Macbook 15.6" 256GB" is matched approriately).

Upvotes: 2

Related Questions