Matthieu Napoli
Matthieu Napoli

Reputation: 49533

Solr synonym replacement fails?

I have a SynonymFilterFactory using a synonym file. From the Solr documentation:

#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit

However, when querying sea biscuit, I end up with results related to sea, biscuit and seabiscuit.

This is as if I had the following configuration (with expand="true"):

sea biscuit, sea biscit, seabiscuit

I don't understand this behavior, because in the Solr analysis tool, when querying sea biscuit it is properly replaced by seabiscuit only.

In other words: explicit synonym mapping with => doesn't work.


Edit: field configuration

Tokenized: true

Class Name: org.apache.solr.schema.TextField

Index Analyzer: org.apache.solr.analysis.TokenizerChain

Filters:

org.apache.solr.analysis.StopFilterFactory args:{enablePositionIncrements: true words: stopwords.txt ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 catenateWords: 1 catenateNumbers: 1 splitOnCaseChange: 1 catenateAll: 0 generateNumberParts: 1 generateWordParts: 1 }
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: protwords.txt }
org.apache.solr.analysis.LengthFilterFactory args:{min: 2 max: 500 }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
org.apache.solr.analysis.ASCIIFoldingFilterFactory args:{}

Query Analyzer: org.apache.solr.analysis.TokenizerChain

Filters:

org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.SynonymFilterFactory args:{expand: true ignoreCase: true synonyms: synonyms.txt }
org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 catenateWords: 0 catenateNumbers: 0 splitOnCaseChange: 1 catenateAll: 0 generateNumberParts: 1 generateWordParts: 1 }
org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: protwords.txt }
org.apache.solr.analysis.LengthFilterFactory args:{min: 2 max: 500 }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
org.apache.solr.analysis.ASCIIFoldingFilterFactory args:{}

Upvotes: 4

Views: 2330

Answers (2)

Yashveer Rana
Yashveer Rana

Reputation: 558

SynonymFilterFactory has been deprecated and should now be replaced with SynonymGraphFilterFactory. It squashes tokens and fixes issues with multi-word synonyms when more than one token exist at the same position.

Upvotes: 1

Romain Meresse
Romain Meresse

Reputation: 3044

Are you doing a phrase query (using double-quotes) ? If not, you are giving two different tokens to the SynonymFilter (sea and biscuit). In that case, no matching synonym is found.

By the way, it's almost always a better idea to handle synonyms at index time. Look here : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Upvotes: 0

Related Questions