Reputation: 33
Given the following structure, in XPATH, I want to select the entire tree but only include the first date thus excluding all of the other dates. The number of dates after the first date is not constant. Any ideas? My apologies is the format isn't correct.
<A>
<B>
<DATE>04272011</DATE>
<C>
<D>
<DATE>02022011</DATE>
</D>
<D>
<DATE>03142011</DATE>
</D>
</C>
</B>
</A>
My appologies.
A better example
<NOTICES>
<SNOTE>
<DATE>01272011</DATE>
<ZIP>35807</ZIP>
<CLASSCOD>A</CLASSCOD>
<EMAIL>
<ADDRESS>address 1</ADDRESS>
</EMAIL>
<CHANGES>
<MOD>
<DATE>02022011</DATE>
<MODNUM>12345</MODNUM>
<EMAIL>
<ADDRESS>address 2</ADDRESS>
</EMAIL>
</MOD>
<MOD>
<DATE>03022011</DATE>
<MODNUM>56789</MODNUM>
<EMAIL>
<ADDRESS>address 3</ADDRESS>
</EMAIL>
</MOD>
</CHANGES>
</SNOTE>
</NOTICES>
I'm breaking up one large xml file into individual XML files. My original XPATH statement is
/NOTICES/SNOTE
Each individual xml file looks fine except it pulls in all of the dates: This is my desired output.
<SNOTE>
<DATE>01272011</DATE>
<ZIP>35807</ZIP>
<CLASSCOD>A</CLASSCOD>
<EMAIL>
<ADDRESS>address 1</ADDRESS>
</EMAIL>
<CHANGES>
<MOD>
<MODNUM>12345</MODNUM>
<EMAIL>
<ADDRESS>address 2</ADDRESS>
</EMAIL>
</MOD>
<MOD>
<MODNUM>56789</MODNUM>
<EMAIL>
<ADDRESS>address 3</ADDRESS>
</EMAIL>
</MOD>
</CHANGES>
</SNOTE>
Upvotes: 1
Views: 425
Reputation: 243479
XPath is a query language for XML documents and as such it cannot alter the structure of the document (such as insert/delete/rename nodes).
What you need is an XSLT transformation -- as simple as this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="DATE[preceding::DATE]"/>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<A>
<B>
<DATE>04272011</DATE>
<C>
<D>
<DATE>02022011</DATE>
</D>
<D>
<DATE>03142011</DATE>
</D>
</C>
</B>
</A>
the wanted, correct result is produced:
<A>
<B>
<DATE>04272011</DATE>
<C>
<D/>
<D/>
</C>
</B>
</A>
Upvotes: 3
Reputation: 27994
If by "select the entire tree" you mean "select the set of all the nodes in the tree" (except the non-first DATE elements), that can be done:
"//node()[not(self::DATE) or not(preceding::DATE)]"
Then, the non-first <DATE>
element nodes will not themselves be in the selected nodeset, but nodes in the selected nodeset (such as the root node, or <D>
) will still have <DATE>
descendants.
If instead you want to select the tree (i.e. the root node), or rather a modified version of it, such that <D>
elements do not have any <DATE>
children, then that requires modification of the tree. XPath can't modify XML trees by itself. You need an XML transformation technology, such as XSLT or an XML DOM library.
Upvotes: 1