Reputation: 6540
I am indexing documents that have a large, textual content field. Most of the time I want to do special processing on that data, as well as on the incoming queries. (My current fieldType definition is at the bottom.)
However, sometimes, like when the user passes in something in quotation marks, I'd like to essentially use a different query analyzer than the one defined for the field. Maybe use a KeywordTokenizerFactory instead of a WhitespaceTokenizerFactory, so that I can match "multiple words in a phrase" without them being split apart.
How can I choose a different query analyzer at query time?
I understand that I can use copyField and setup an entirely different field definition, but this would essentially double the space used for my Solr index, which isn't feasible.
<fieldType name="text_en_splitting_reversed" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<!-- convert things like é to e and ŕ to r -->
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnNumerics="1" splitOnCaseChange="1" types="word-delim-types.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
</analyzer>
<analyzer type="query">
<!-- convert things like é to e and ŕ to r -->
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" types="word-delim-types.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
</analyzer>
</fieldType>
Upvotes: 1
Views: 1812
Reputation: 9255
It is actually possible to dynamically change the analyzer used, but it requires some custom code. Check out slide 30 in http://www.slideshare.net/treygrainger/semantic-multilingual-strategies-in-lucenesolr, where Trey is talking about using this approach to support different analyzers for multi-lingual fields. His approach has to do this for both indexing and query analysis, whereas for you it's just the query.
Here's the JIRA feature request that Trey is referencing.
Upvotes: 2