Reputation: 2057
Is it possible to query numbers in SOLR which have special characters?
I have a field score
which can have decimal percentages like 35.49%
, 104.18%
etc.
I need to query this field with greater than
and less than
operators. I have tried using WordDelimiterFilterFactory
and created a new custom field like this.
<fieldType name="alphaNumericSort" class="solr.TextField" sortMissingLast="false" omitNorms="true">
<analyzer>
<!-- KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token
-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<!-- The LowerCase TokenFilter does what you expect, which can be
when you want your sorting to be case insensitive
-->
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
preserveOriginal="1"
types="lang/delim-types.txt" />
<filter class="solr.LowerCaseFilterFactory" />
<!-- The TrimFilter removes any leading or trailing whitespace -->
<filter class="solr.TrimFilterFactory" />
<!-- Left-pad numbers with zeroes -->
<filter class="solr.PatternReplaceFilterFactory"
pattern="(\d+)" replacement="00000$1" replace="all"
/>
<!-- Left-trim zeroes to produce 6 digit numbers -->
<filter class="solr.PatternReplaceFilterFactory"
pattern="0*([0-9]{6,})" replacement="$1" replace="all"
/>
<!-- Remove all but alphanumeric characters -->
<filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z0-9])" replacement="" replace="all"
/>
</analyzer>
</fieldType>
The content of the file delim-types.txt is
% => ALPHA
But when I query like this,
- score:[* TO 100.00]
It doesn't return any results. Am I doing something wrong?
Upvotes: 0
Views: 884
Reputation: 52802
First - I'd avoid naming a field score, as that will also be the field name used internally by Solr to refer to the score of the document after performing a search (in the fl
parameter or in a sort
)
Your existing chain tries to make text field / strfield sorting work by padding numbers to the exact same length. The regex replacement filters will remove anything not a number, so 3.3 and 3.30 will be considered differing numbers.
A better way to implement this would be to use a numeric field. If you can accept the inaccuracies of a double field, a TrieDoubleField
is probably the best option.
Removing the % can be done in an update processor. Something that uses the RegexReplaceProcessor could work (these are defined in solrconfig.xml
):
<updateRequestProcessorChain name="remove_percent">
<processor class="solr.RegexReplaceProcessorFactory">
<str name="fieldName">score_own</str>
<str name="pattern">%</str>
<str name="replacement"></str>
<bool name="literalReplacement">true</bool>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
You can then reference this update processor by either including update.chain=remove_percent
in your update request URL, or by configuring the requestHandler with the parameter to make Solr invoke it automagically (see Configuring a custom chain as a default on the Update Request Processors wiki page:
<initParams path="/update/**">
<lst name="defaults">
<str name="update.chain">remove_percent</str>
</lst>
</initParams>
or through the definition of the requestHandler:
<requestHandler ... >
<lst name="defaults">
<str name="update.chain">remove_percent</str>
</lst>
</requestHandler>
Upvotes: 1