Dave
Dave

Reputation: 4328

filter for the highest value of a substring in a nodeset returned using XPATH 1.0.

I want to get the release number of the most recent snapshot using xpath 1.0. In this example it will be 0.0.3-SNAPSHOT.

<html>
<head><title>Title</title>
</head>
<body>
<h1>Index </h1>
<pre>Name               </pre><hr/>
<pre><a href="../">../</a>
<a href="0.0.1-SNAPSHOT/">0.0.1-SNAPSHOT/</a>          
<a href="0.0.2-SNAPSHOT/">0.0.2-SNAPSHOT/</a>          
<a href="0.0.3-SNAPSHOT/">0.0.3-SNAPSHOT/</a>          
<a href="metadata.xml">metadata.xml</a>   
</pre>
</body></html>

I have done this using

xpath snapshot.xml "(//a)[last()-1]"

Im not comfortable with assuming that the hightest value of the snapshot version will always be at an index position of last()-1-SNAPSHOT.

I can assume that the values (0.0.1, 0.0.2) will always increment from top to bottom of document.

I'd like to write an xpath expression to do the following

1) parse the full nodeset to return only anchor links containing string SNAPSHOT

Expected result

> 0.0.1-SNAPSHOT/
> 0.0.2-SNAPSHOT/
> 0.0.3-SNAPSHOT/

I was successful.There are a few way of doing this using a predicate

xpath snapshot.xml "(//pre/a/text() [contains( . , 'SNAPSHOT')]" xpath snapshot.xml "(//a/text() [contains( . , 'SNAPSHOT')]"

However too many nodes are returned so I'd then like to filter by either

2a) Get the last node in the set, which doesnt seem possible because contains() returns a string not a nodeset

I failed like this

xpath snapshot.xml "(//a)[contains(text(),'SNAPSHOT')last()]"
xpath snapshot.xml "(//a)[contains(text(),'SNAPSHOT')][last()]"
xpath snapshot.xml "(//a)[not ( contains(text(),'SNAPSHOT') ) < text()]"

2b) Get the node with the highest value. Which means if the string is "0.0.3-SNAPSHOT" selecting the substrings 0.0.1, 0.0.2 and 0.0.3 before -SNAPSHOT and getting the max value.

And then I failed like this

xpath snapshot.xml "(//a)[ not(../a/text() > text()) ]"

I am using https://www.w3.org/TR/xpath for guidance.

How do I filter for the highest value of a substring in a nodeset returned using XPATH 1.0. Is it possible in this case?

Upvotes: 0

Views: 273

Answers (2)

har07
har07

Reputation: 89325

Selecting the last a element that contains text 'SNAPSHOT' is actually doable and would work for your specific XML sample. Only parentheses in your attempted XPath was slightly off, try this way instead :

(//a[contains(text(),'SNAPSHOT')])[last()]

Upvotes: 1

zx485
zx485

Reputation: 29052

You can do this with a lexical sort over the @href attributes. Because numbers do sort very well lexically, this is possible. Applying xsl:sort with a descending order over the @href attributes sorts the versions well and extracting the first element gives you the desired result. The rest is just facade. So try this:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="text()" />
  <xsl:template match="/html/body/pre">
    <xsl:value-of select="'&#10;'" />
    <xsl:variable name="highest">
      <xsl:for-each select="a[substring(@href,1,1) > 0 or substring(@href,1,1) &lt; 10]">
        <xsl:sort select="@href" order="descending" />
        <r><xsl:value-of select="normalize-space(.)" /></r>
      </xsl:for-each>
    </xsl:variable>
    <xsl:if test="$highest != ''">
      <xsl:value-of select="concat('Latest version is: ',$highest/r[1],'&#10;')" />
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

But this only works with one-digit-version-numbers. For several-digit-version-numbers a different approach is necessary.

Upvotes: 1

Related Questions