Hammer Time
Hammer Time

Reputation: 33

XPATH selecting entire tree including only the first

Given the following structure, in XPATH, I want to select the entire tree but only include the first date thus excluding all of the other dates. The number of dates after the first date is not constant. Any ideas? My apologies is the format isn't correct.

<A>
    <B>
        <DATE>04272011</DATE>
        <C>
           <D>
                <DATE>02022011</DATE>
           </D>
           <D>
                <DATE>03142011</DATE>
           </D>
        </C>
    </B>
</A>

My appologies.

A better example

<NOTICES>

<SNOTE>

    <DATE>01272011</DATE>
    <ZIP>35807</ZIP>
    <CLASSCOD>A</CLASSCOD>
    <EMAIL>
        <ADDRESS>address 1</ADDRESS>
    </EMAIL>
    <CHANGES>
        <MOD>
            <DATE>02022011</DATE>
            <MODNUM>12345</MODNUM>
            <EMAIL>
                <ADDRESS>address 2</ADDRESS>
            </EMAIL>
        </MOD>
        <MOD>
            <DATE>03022011</DATE>
            <MODNUM>56789</MODNUM>
            <EMAIL>
                <ADDRESS>address 3</ADDRESS>
            </EMAIL>
        </MOD>
    </CHANGES>
</SNOTE>

</NOTICES>

I'm breaking up one large xml file into individual XML files. My original XPATH statement is

/NOTICES/SNOTE

Each individual xml file looks fine except it pulls in all of the dates: This is my desired output.

<SNOTE>

<DATE>01272011</DATE>
<ZIP>35807</ZIP>
<CLASSCOD>A</CLASSCOD>
<EMAIL>
    <ADDRESS>address 1</ADDRESS>
</EMAIL>
<CHANGES>
    <MOD>
        <MODNUM>12345</MODNUM>
        <EMAIL>
            <ADDRESS>address 2</ADDRESS>
        </EMAIL>
    </MOD>
    <MOD>
        <MODNUM>56789</MODNUM>
        <EMAIL>
            <ADDRESS>address 3</ADDRESS>
        </EMAIL>
    </MOD>
</CHANGES>

</SNOTE>

Upvotes: 1

Views: 425

Answers (2)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243479

XPath is a query language for XML documents and as such it cannot alter the structure of the document (such as insert/delete/rename nodes).

What you need is an XSLT transformation -- as simple as this:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="DATE[preceding::DATE]"/>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<A>
    <B>
        <DATE>04272011</DATE>
        <C>
            <D>
                <DATE>02022011</DATE>
            </D>
            <D>
                <DATE>03142011</DATE>
            </D>
        </C>
    </B>
</A>

the wanted, correct result is produced:

<A>
   <B>
      <DATE>04272011</DATE>
      <C>
         <D/>
         <D/>
      </C>
   </B>
</A>

Upvotes: 3

LarsH
LarsH

Reputation: 27994

If by "select the entire tree" you mean "select the set of all the nodes in the tree" (except the non-first DATE elements), that can be done:

"//node()[not(self::DATE) or not(preceding::DATE)]"

Then, the non-first <DATE> element nodes will not themselves be in the selected nodeset, but nodes in the selected nodeset (such as the root node, or <D>) will still have <DATE> descendants.

If instead you want to select the tree (i.e. the root node), or rather a modified version of it, such that <D> elements do not have any <DATE> children, then that requires modification of the tree. XPath can't modify XML trees by itself. You need an XML transformation technology, such as XSLT or an XML DOM library.

Upvotes: 1

Related Questions