Reputation: 4010
If I sort a string field called code I get the following resutls:
{code:ABC-120GB}
{code:ABC-120GBY}
{code:ABC-120GY}
{code:ABC-120G}
{code:ABC-120GB}
{code:ABC-120GBY}
{code:ABC-120GY}
These are the configuration for the mentioned field from the schema.xml
file:
<fields
<field name="code" type="string" indexed="true" stored="true" required="true"/>
</fields>
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
</types>
Upvotes: 0
Views: 931
Reputation: 1242
It looks like the first level sort on the code_length
is working. Does the sort on code
work if it's the only sort specified? My suspicion is that you would see the same issues with the sort on the code
field if it were the only field you were using for sort.
It seems quite likely that the problem you're seeing is caused by variations in the data we can't see because you haven't included the real data. It would be interesting to see if you can recreate this problem with the data you actually posted, or other non-sensitive values. For one, I would suspect invisible variations in the character encoding. If that's the case, you could try modifying the code
field to be a single-token text-based field rather than an unmodified string field. Then, you have the choice of various filters to add to that fieldType that could normalize encoding variations. A good filter to consider is the ICU Folding Filter which manages a lot of normalization and can be added to your fieldType
definition with this line:
<filter class="solr.ICUFoldingFilterFactory"/>
You could consider this definition for a fieldType
called "exact" that might work for you. By "tokenizing" the value into one large token, it preserves the exact-match searching you have now with the code
field having type="string"
, and it will make Solr happy by only having one token on a sort field as well.
<fieldType name="exact" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<!-- KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token
-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldType>
Then you'd change your code
field definition to:
<field name="code" type="exact" indexed="true" stored="true" required="true" multiValued="false"/>
Of course, this is somewhat speculative since I can't know what the data I'm not seeing might show me, and the ICU Folding Filter cannot adjust for everything that might be causing your trouble. But I hope this helps.
multiValued="false"
for any fields you plan to use for sorting. It will be less ambiguous.Upvotes: 3