esb2010
esb2010

Reputation: 1

XPath to get nodes with a matching child-node and highest value for other child-node

I need help in XPath 1.0 for filtering the below XML such that I get only the bars with distinct 'id' and with the highest 'validity/date':

<foo name="fooName">
    <bar name="barName">
        <id>1111</id>
        <validity>
            <date>20170920</date>
        </validity>
    </bar>
    <bar name="barName">
        <id>1111</id>
        <validity>
            <date>20170922</date>
        </validity>
    </bar>
    <bar name="barName">
        <id>1111</id>
        <validity>
            <date>20170921</date>
        </validity>
    </bar>
    <bar name="barName">
        <id>2222</id>
        <validity>
            <date>20170921</date>
        </validity>
    </bar>
    <bar name="barName">
        <id>2222</id>
        <validity>
            <date>20170923</date>
        </validity>
    </bar>
</foo>

I tried a lot of options and research, but not able to figure out the exact solution.

Expected XML after filtering should look like:

<foo name="fooName">
    <bar name="barName">
        <id>1111</id>
        <validity>
            <date>20170922</date>
        </validity>
    </bar>
    <bar name="barName">
        <id>2222</id>
        <validity>
            <date>20170923</date>
        </validity>
    </bar>
</foo>

Upvotes: 0

Views: 557

Answers (2)

esb2010
esb2010

Reputation: 1

As suggested, I came up with the below xslt which seems to work fine:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
    <xsl:key name="bars-by-id" match="foo/bar" use="id" />
    <xsl:template match="foo">
        <foo name="fooName">
            <xsl:for-each select="bar[count(. | key('bars-by-id', id)[1]) = 1]">
                <xsl:variable name="currentID" select="id" />
                <xsl:variable name="barsForID" select="key('bars-by-id', $currentID)"/>
                <xsl:copy-of select="$barsForID[not(../bar[id=$currentID]/validity/date > validity/date)]" />
            </xsl:for-each>
        </foo>
    </xsl:template>
</xsl:stylesheet>

Thanks for the suggestions, it really helped. Please feel free to correct me.

Upvotes: 0

C. M. Sperberg-McQueen
C. M. Sperberg-McQueen

Reputation: 25034

You should read up on "Muenchian grouping", to which michael.hor257k has already given you a pointer. (A web search will find plenty of others.)

What Muenchian grouping does is make faster what you can in principle do without it. In some situations the added speed makes the difference between 'possible in principle' and 'workable in practice'. But in some situations a simple-minded approach to this problem suffices.

Problem 1: you only want one 'bar' element in the output for each distinct 'ID'. (Note that your sample output shows that your description is wrong: you do NOT want "only bars with unique 'id'", since none of the bars with ID 1111 or 2222 has a unique ID in the input. You want a single output for each distinct value of 'id'. Not the same thing.)

One way to solve this problem: write two templates for 'bar', one which fires for the first occurrence of a given 'id' (and actually does the work of finding the greatest validity/date value), and the other which causes all later occurrences of 'bar' with that 'id' to be ignored.

<xsl:template match="bar" priority="10.0">
   <!--* find the highest validity/date with this ID here,
       * do what needs to be done. *-->
   ...
</xsl:template>
<xsl:template match="bar[id = preceding-sibling::bar/id]"
              priority="20.0"/>

I've given explicit priorities to warn future-me that I'm trying to something clever here (and to prevent future-me from screwing it up by changing the match patterns in such a way as to change the relative priorities).

Another way to do it is to put a choose/when inside the template for 'bar'.

<xsl:template match="bar">
  <xsl:variable name="id" select="string(id)"/>
  <xsl:choose>
    <xsl:when test="preceding::bar[id=$id]"/>
    <xsl:otherwise>
      <!--* this is the first of this ID, deal with this ID now *-->
      ...
    </
  </
</

This second pattern may make it easier to formulate the logic needed to find the 'bar' elements you actually want to copy to the output. You want to process not the first instance of each ID, but the instance(s) with the highest validity/date value:

<xsl:template match="bar">
  <xsl:variable name="id" select="string(id)"/>

  <xsl:choose>
    <!--* the behavior of comparisons here requires a little
        * bit of standing on our heads.  We want this 'bar' if
        * its validity/date value is greater than or equal to
        * all other such values for this ID.  So first we filter
        * out all cases where there is a higher validity/date value
        * on another 'bar' with this ID. *-->
    <xsl:when test="validity/date &lt; //bar[id=$id]/validity/date"/>

    <!--* The 'otherwise' case handles situations where this
        * is the only 'bar' with this ID, or where there is no
        * higher validity/date value. *-->
    <xsl:otherwise>
      <xsl:copy-of select="."/>
    </
  </
</

If this is a one-off or run-seldom stylesheet run on 'manageable' inputs, this may be fast enough, and this pattern may be easier to understand than Muenchian grouping, unless you already have a very good understanding of keys and their uses. If it's too slow, Muenchian grouping will show you what is normally a faster way of accomplishing the same thing.

[Note: the initial version of the answer had a maxdate variable

<xsl:variable name="maxdate" 
              select="max(//bar[id=$id]/validity/date)"/>

and simply compared the current value to it:

<xsl:when test="validity/date = $maxdate">
  <xsl:copy-of select="."/>
</

But the only aggregate functions in XPath 1.0 are count() and sum(). I would say "See how much easier this is in XSLT 2.0?" but if you were in 2.0 the entire thing would just be something like

<xsl:sequence select="for $v in distinct-values(//bar/id)
    for $max in max(//bar[id=$v]/validity/date)
    return //bar[id=$v and validity/date = $max]"/>

and the max() function plays really a relatively modest role in making things so much simpler.]

Upvotes: 1

Related Questions