BlueDolphin
BlueDolphin

Reputation: 9765

How to use IBM ICU collator to handle special characters?

We are using IBM ICU Collator to sort some of our internal string list. Those string lists have special characters, such as 0x1, 0x2, 0x3 to separate some internal structures, the string list could also contains some mixed languages.

Then we found the IBM ICU Collator sort them in unexpected way, for example, we have string:

firstName
firstName\x1Account Name
firstName - lastName

\x1 means character with decimal value 1.

We expect the sorting to keep the order, but instead, it gives the following result in English locale:

firstName
firstName - lastName
firstName\x1Account Name

We are wondering whether there is any settings so that we could use special characters which is less than 0x5.

Thanks.

Upvotes: 0

Views: 220

Answers (1)

Steven R. Loomis
Steven R. Loomis

Reputation: 4350

I would recommend only sorting individual sub fields together. Barring that, you could append a rule string such as & \uFFFF = \u0001 = \u0002 = \u0003 = \u0004 = \u0005 which will say that 0x1,2,3,4,5 all sort after than any other text.

Upvotes: 0

Related Questions