mmo
mmo

Reputation: 4214

XPath: How to select all sibling nodes up to one fulfilling some condition?

I am trying to write me an XPath-expression returning all sibling nodes up to one, that satisfies a specific condition. In my specific case I have an (X)HTML list with list-items of which some have a specific class and other elements that have no class.

To visualize: I am standing at one of the list items that DO have a class "foo" (e.g. the li containing the text "D" and I want to get a list of the subsequent li's containing "E", "F" and "G", but none of the subsequent items containing "H", "I" and "J".

...
<li class="foo">A</li>
<li>B</li>
<li>C</li>
<li class="foo">D</li>
<li>E</li>
<li>F</li>
<li>G</li>
<li class="foo">H</li>
<li>I</li>
<li>J</li>
...

I am standing at one of the list items that DO have a class "foo" (e.g. the li containing the text "D" and I want to get a list of the subsequent li's containing "E", "F" and "G", but none of the subsequent items containing "H", "I" and "J".

I am using Java v1.8 and its built-in javax.xml.xpath package accessing a previously parsed org.w3c.dom.Document.

Note: I have googled extensively for a solution and I am aware that there are quite a number of very similar looking examples, even here on StackOverflow, but none of these worked for me! Whatever I tried and adapted to the case at hand always gave me just the first element only ("E" in this example) or none at all. :-(

Later addition:

Since I apparently expressed myself so badly, I am appending a test-program:

package pull_lis;

import java.io.FileInputStream;

import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.tidy.Tidy;

public class TestXPathExpression
{
    public static void main(String[] args) throws Exception {
        Tidy tidy = new Tidy();
        XPathFactory xpathfactory = XPathFactory.newInstance();
        XPath xpath = xpathfactory.newXPath();

        Document doc = tidy.parseDOM(new FileInputStream("sample.xml"), System.out);

        XPathExpression expr1 = xpath.compile("//li[@class='foo']");

//      XPathExpression expr2 = xpath.compile("//li[@class='foo'][2]/following-sibling::li[@class='foo'][1]/preceding-sibling::li[preceding-sibling::li[@class='foo'][2]]");
        XPathExpression expr2 = xpath.compile("???"); // <<<< IT IS THIS EXPRESSION THAT I AM SEEKING

        NodeList foos = (NodeList)expr1.evaluate(doc, XPathConstants.NODESET);
        System.out.println(foos.getLength() + " foos found.");

        for (int idx1 = 0; idx1 < foos.getLength(); idx1++) {
            Node foo = foos.item(idx1);
            System.out.println("foo[" + idx1 + "]: " + foo.getChildNodes().item(0).getNodeValue());
            NodeList nodes = (NodeList)expr2.evaluate(foo, XPathConstants.NODESET);
            for (int idx2 = 0; idx2 < nodes.getLength(); idx2++) {
                Node node = nodes.item(idx2);
                System.out.println(non-foo[" + idx2 + "]: " + node.getChildNodes().item(0).getNodeValue());
            }   
        }
    }
}

sample.xml contains:

<html>
    <head>
        <title>Example</title>
    </head>
    <body>
        <ul>
            <li class="foo">A</li>
            <li>B</li>
            <li>C</li>
            <li class="foo">D</li>
            <li>E</li>
            <li>F</li>
            <li>G</li>
            <li class="foo">H</li>
            <li>I</li>
            <li>J</li>
        </ul>
    </body>
</html>

If I let the above program run on sample.xml using the expression provided by kjhughes I get:

3 foos found.
foo[0]: A
non-foo[0]: E
non-foo[1]: F
non-foo[2]: G
foo[1]: D
non-foo[0]: E
non-foo[1]: F
non-foo[2]: G
foo[2]: H
non-foo[0]: E
non-foo[1]: F
non-foo[2]: G

but what I want/need is:

3 foos found.
foo[0]: A
non-foo[0]: B
non-foo[1]: C
foo[1]: D
non-foo[0]: E
non-foo[1]: F
non-foo[2]: G
foo[2]: H
non-foo[0]: I
non-foo[1]: J

Hope I could make myself a bit clearer this time...

M.

Upvotes: 0

Views: 703

Answers (2)

kjhughes
kjhughes

Reputation: 111686

Given this XHTML:

<ul>
  <li class="foo">A</li>
  <li>B</li>
  <li>C</li>
  <li class="foo">D</li>
  <li>E</li>
  <li>F</li>
  <li>G</li>
  <li class="foo">H</li>
  <li>I</li>
  <li>J</li>
</ul>

This XPath:

//li[. = 'D']/following-sibling::li[@class='foo'][1]/preceding-sibling::li[preceding-sibling::li[. = 'D']]

Will return those li after the starting <li>D</li> but before the next li with class='foo':

<li>E</li>
<li>F</li>
<li>G</li>

Update

OP has stated in comments that the first node of interest should be marked not by its contents of "D" but by being the second li with @class="foo".

Here is the above XPath that starts per this new criteria:

//li[@class='foo'][2]/following-sibling::li[@class='foo'][1]/preceding-sibling::li[preceding-sibling::li[@class='foo'][2]]

It selects the "E", "F", and "G" li elements as requested.

Upvotes: 3

Michael Kay
Michael Kay

Reputation: 163458

I've tried to remember all my XPath 1.0 programming tricks, and I've come to the conclusion it can't be done in a single XPath 1.0 expression. That's a bold statement and someone may prove me wrong.

But since you're in Java, you're not restricted to XPath 1.0. Get yourself an XPath 2.0 library (e.g. Saxon), then you can write

for $N in following-sibling::li[@class='foo'][1] 
return following-sibling::li[. << $N]

Alternatively, since you're using DOM (why does anyone use DOM nowadays?) just iterate over the following siblings in your Java code until you find the one that matches.

Upvotes: 0

Related Questions