Deadfish
Deadfish

Reputation: 2057

Apache Solr - Querying Numbers with Special Characters

Is it possible to query numbers in SOLR which have special characters?

I have a field score which can have decimal percentages like 35.49%, 104.18% etc.

I need to query this field with greater than and less than operators. I have tried using WordDelimiterFilterFactory and created a new custom field like this.

<fieldType name="alphaNumericSort" class="solr.TextField" sortMissingLast="false" omitNorms="true">
      <analyzer>
          <!-- KeywordTokenizer does no actual tokenizing, so the entire
               input string is preserved as a single token
            -->
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <!-- The LowerCase TokenFilter does what you expect, which can be
               when you want your sorting to be case insensitive
            -->
          <filter class="solr.WordDelimiterFilterFactory"
                  generateWordParts="1"
                  generateNumberParts="1"
                  catenateWords="0"
                  catenateNumbers="0"
                  catenateAll="0"
                  preserveOriginal="1"
                  types="lang/delim-types.txt" />
          <filter class="solr.LowerCaseFilterFactory" />
          <!-- The TrimFilter removes any leading or trailing whitespace -->
          <filter class="solr.TrimFilterFactory" />
          <!-- Left-pad numbers with zeroes -->
          <filter class="solr.PatternReplaceFilterFactory"
                  pattern="(\d+)" replacement="00000$1" replace="all"
                  />
          <!-- Left-trim zeroes to produce 6 digit numbers -->
          <filter class="solr.PatternReplaceFilterFactory"
                  pattern="0*([0-9]{6,})" replacement="$1" replace="all"
                  />
          <!-- Remove all but alphanumeric characters -->
          <filter class="solr.PatternReplaceFilterFactory"
                  pattern="([^a-z0-9])" replacement="" replace="all"
                  />
      </analyzer>
  </fieldType>

The content of the file delim-types.txt is

% => ALPHA

But when I query like this,

- score:[* TO 100.00] 

It doesn't return any results. Am I doing something wrong?

Upvotes: 0

Views: 884

Answers (1)

MatsLindh
MatsLindh

Reputation: 52802

First - I'd avoid naming a field score, as that will also be the field name used internally by Solr to refer to the score of the document after performing a search (in the fl parameter or in a sort)

Your existing chain tries to make text field / strfield sorting work by padding numbers to the exact same length. The regex replacement filters will remove anything not a number, so 3.3 and 3.30 will be considered differing numbers.

A better way to implement this would be to use a numeric field. If you can accept the inaccuracies of a double field, a TrieDoubleField is probably the best option.

Removing the % can be done in an update processor. Something that uses the RegexReplaceProcessor could work (these are defined in solrconfig.xml):

<updateRequestProcessorChain name="remove_percent">
    <processor class="solr.RegexReplaceProcessorFactory">
        <str name="fieldName">score_own</str>
        <str name="pattern">%</str>
        <str name="replacement"></str>
        <bool name="literalReplacement">true</bool>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

You can then reference this update processor by either including update.chain=remove_percent in your update request URL, or by configuring the requestHandler with the parameter to make Solr invoke it automagically (see Configuring a custom chain as a default on the Update Request Processors wiki page:

<initParams path="/update/**">
    <lst name="defaults">
        <str name="update.chain">remove_percent</str>
    </lst>
</initParams>

or through the definition of the requestHandler:

<requestHandler ... >
    <lst name="defaults">
        <str name="update.chain">remove_percent</str>
    </lst>
</requestHandler>

Upvotes: 1

Related Questions