albinosilver
albinosilver

Reputation: 65

How to sort XML entries that contain numbers but sometimes also contain letters?

I have these id values that I want to have sorted:

<rule id="1.1">
</rule>
<rule id="1.2">
</rule>
<rule id="1.3">
</rule>
<rule id="Id. 4.3">
</rule>
<rule id="Id. 4.9">
</rule>
<rule id="Id. 4.10">
</rule>
<rule id="Id. 4.11">
</rule>
<rule id="Id. 4.12">
</rule>

Currently, I'm trying to sort like so. This works for the id values that only have numbers but not for the ones that also have letters.

<xsl:sort select="substring-before(@id, '.')" data-type="number"/>
<xsl:sort select="substring-after(@id, '.')" data-type="number"/>

The order it is currently giving me is:

Id. 4.10
Id. 4.11
Id. 4.12
Id. 4.3
Id. 4.9
1.1
1.2
1.3

How can I sort it so the order is:

Id. 4.3
Id. 4.9
Id. 4.10
Id. 4.11
Id. 4.12
1.1
1.2
1.3

Upvotes: 3

Views: 331

Answers (2)

Michael Kay
Michael Kay

Reputation: 163322

XSLT 3.0 defines a collation URI for this:

<xsl:sort collation="http://www.w3.org/2013/collation/UCA?numeric=yes"/>

This treats any sequence of digits as a number, so 2.20(a)-3 sorts before 2.20(a)-10 and after 2.8(b)-4.

But this (I think) would put "id. 4.10" after "1.3". To solve that you'll need to precede it with another sort key

<xsl:sort select="not(starts-with(., 'id'))"/>

(false sorts before true)

This is implemented in current Saxon releases. Earlier Saxon releases provide the collation URI

http://saxon.sf.net/collation?alphanumeric=yes

with similar semantics.

If that doesn't work for you, then if you always have the same number of numeric components you can split the value up using regular expressions and use multiple sort keys.

Upvotes: 2

michael.hor257k
michael.hor257k

Reputation: 116992

The order you show can be accomplished by using:

<xsl:sort select="number(starts-with(@id, 'Id. '))" data-type="number" order="descending"/>
<xsl:sort select="substring-before(replace(@id, '^Id. ', ''), '.')" data-type="number"/>
<xsl:sort select="substring-after(replace(@id, '^Id. ', ''), '.')" data-type="number"/>

This could probably be simplified by using a collation attribute (which would also handle more levels in your numbering scheme, e.g. "1.10.2") - however, this depends on which processor you use.

Upvotes: 2

Related Questions