Sebastien Lorber
Sebastien Lorber

Reputation: 92200

ElisionFilter before WordDelimiterFilter

On this Solr documentation page I see the following comment:

Note: Its probably best to use the ElisionFilter before WordDelimiterFilter. This will prevent very slow phrase queries.

http://wiki.apache.org/solr/LanguageAnalysis#French

Can someone explain me why it could lead to slow phrase queries please? Actually my WordDelimiterFilter configuration works file and I don't think I need the ElisionFilter since it's somehow already included in the WordDelimiterFilter configuration.

I just wonder what is the impact on performances...

Upvotes: 0

Views: 186

Answers (1)

arun
arun

Reputation: 11023

Based on SOLR-1938, if you have ElisionFilter before WordDelimiterFilter, then l'avion will generate only one token avion. But if ElisionFilter is not there, then depending on the settings of your WordDelimiterFilter, it could generate more than 1 token like

l, avion, lavion

Since avion is anyway generated by the WordDelimiterFilter, you perceive it as though the ElisionFilter is already included in there.

I guess the comment about the slow phrase queries means that if l'avion is searched for, then it will search for more than one token if ElisionFilter is not there.

Update: This post nails the problem: http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance where it says What we discovered is that the word “l’art” was being searched as a phrase query “l art”. Phrase queries are much slower than Boolean queries because the search engine has to read the positions index for the words in the phrase into memory and because there is more processing involved.

so I would guess the problem is for a search in double quotes like "l'avion"

Upvotes: 1

Related Questions