Bin
Bin

Reputation: 13

MarkLogic collation for sorting

I have sorting requirement where the following rules apply

  1. Ignore the capitalization of letters
  2. Ignore mathematical symbols and any special characters that do not include a Latin letter
  3. Ignore punctuation
  4. Ignore leading space
  5. Follow the rule of "A" should come before "An". For example the title "A Treatise" comes before the title "An Asian"

I managed to achieve all except #5 by creating a field with the following collation

http://marklogic.com/collation/en/S1/AS/T00BB

Issue: My sort results below work fine except for 'An Asian' coming on top of 'A Treatise'. I followed the steps from this link, but it doesn't seem to work for "An" and "A" sorting. custom sorting

Æ and Words

An Asian

;a next

A Treatise

Beautiful sky

-Ology and -Osophy

<with leading space here>Public Access

Sentencing Frame

'Soda' Vs 'Pop'

The Results

ÜBer

Is there a collation where I could define to solve above issue.

Expected sorting results:

Æ and Words

;a next

A Treatise

An Asian

Beautiful sky

-Ology and -Osophy

<with leading space here>Public Access

Sentencing Frame

'Soda' Vs 'Pop'

The Results

ÜBer

Upvotes: 1

Views: 53

Answers (1)

cyberbrain
cyberbrain

Reputation: 5075

With the T00BB part you ignore all spaces in sorting, so A Treatise and An Asian are sorted like ATreatise and AnAsian. With S1 you specified "case and diacritic insensitive", so those two items are actually sorted as anasian and atreatise.

Not sure if you can ignore just leading spaces, but ignoring all spaces seems to be the root of your problem to me.

I would calculate a field over the data that lead trims the contents and then sort by that - not ignoring the spaces. I'm not an expert in this, so probably there is a better solution?

There is also an official MarkLogic documentation about the Collation URI Syntax.

Upvotes: 1

Related Questions