Reputation: 92200
On this Solr documentation page I see the following comment:
Note: Its probably best to use the ElisionFilter before WordDelimiterFilter. This will prevent very slow phrase queries.
http://wiki.apache.org/solr/LanguageAnalysis#French
Can someone explain me why it could lead to slow phrase queries please? Actually my WordDelimiterFilter configuration works file and I don't think I need the ElisionFilter since it's somehow already included in the WordDelimiterFilter configuration.
I just wonder what is the impact on performances...
Upvotes: 0
Views: 186
Reputation: 11023
Based on SOLR-1938, if you have ElisionFilter before WordDelimiterFilter, then l'avion
will generate only one token avion
. But if ElisionFilter is not there, then depending on the settings of your WordDelimiterFilter, it could generate more than 1 token like
l, avion, lavion
Since avion
is anyway generated by the WordDelimiterFilter, you perceive it as though the ElisionFilter is already included in there.
I guess the comment about the slow phrase queries means that if l'avion
is searched for, then it will search for more than one token if ElisionFilter is not there.
Update: This post nails the problem: http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance where it says What we discovered is that the word “l’art” was being searched as a phrase query “l art”. Phrase queries are much slower than Boolean queries because the search engine has to read the positions index for the words in the phrase into memory and because there is more processing involved.
so I would guess the problem is for a search in double quotes like "l'avion"
Upvotes: 1