Rob
Rob

Reputation: 151

Selecting nodes in between comments with xpath from xmldocument

I'm trying to get nodes in between comments.

example:

<Name>
  <First>a</First>
  <Last>b</Last>
</Name>
<!-- family names -->
<Name>
  <First>c</First>
  <Last>d</Last>
</Name>
<Name>
  <First>e</First>
  <Last>f</Last>
</Name>
<Name>
  <First>g</First>
  <Last>h</Last>
</Name>
<!-- family ends -->
<!-- other names -->
<Name>
  <First>i</First>
  <Last>j</Last>
</Name>
<Name>
  <First>k</First>
  <Last>l</Last>
</Name>
<!-- other ends -->

I'd like to be able to select the nodes in between the comment family names and family ends. Tried several ways with xpath, but I cant get further then selecting all comment nodes. When I want to select comment nodes containing value x, I do not get any result. So I'm not sure how to continue. for example:

var x = xml.SelectSingleNode("//comment()[contains('family names')]");

Thanks in advance.

Upvotes: 2

Views: 1219

Answers (1)

Mathias M&#252;ller
Mathias M&#252;ller

Reputation: 22617

What's wrong with your attempt?

An expression like

//comment()[contains('family names')]

is not valid XPath. The contains() function expects two arguments, a first argument that is a string (or can be coerced into a string by computing the string value of a node) and a second one that is also a string. The following would have worked:

//comment()[contains(.,'family names')]

But that does not get you far yet, because once you've identified the starting comment, you need to find what comes after it.

A correct XPath expression

Use the following expression:

//comment()[contains(.,'family names')]/following::*[not(preceding::comment()[contains(.,'family ends')])]

which translates to

//comment()                         Find comment nodes anywhere in the documents
[contains(.,'family names')]        but only select them if they contain the text
                                    "family names"
/following::*                       Select all element nodes that follow those comments
[not(preceding::comment()           but only return them if they are not preceded by
                                    a comment node...
[contains(.,'family ends')])]       ...that contains the text "family ends".

Applied to a well-formed and more sensible input XML document:

Input XML

<root>
<Name>
  <First>NO</First>
  <Last>NO</Last>
</Name>
<!-- family names -->
<Name>
  <First>YES</First>
  <Last>YES</Last>
</Name>
<Name>
  <First>YES</First>
  <Last>YES</Last>
</Name>
<Name>
  <First>YES</First>
  <Last>YES</Last>
</Name>
<!-- family ends -->
<!-- other names -->
<Name>
  <First>NO</First>
  <Last>NO</Last>
</Name>
<Name>
  <First>NO</First>
  <Last>NO</Last>
</Name>
</root>

The result will be (individual results separated by -------):

Output

<Name>
<First>YES</First>
<Last>YES</Last>
</Name>
-----------------------
<First>YES</First>
-----------------------
<Last>YES</Last>
-----------------------
<Name>
<First>YES</First>
<Last>YES</Last>
</Name>
-----------------------
<First>YES</First>
-----------------------
<Last>YES</Last>
-----------------------
<Name>
<First>YES</First>
<Last>YES</Last>
</Name>
-----------------------
<First>YES</First>
-----------------------
<Last>YES</Last>

Whoever designed this XML document did not design it very cleverly, if you pardon my French. Relying on comments with a specific text in a specific position is very dangerous.

Upvotes: 2

Related Questions