Kermit
Kermit

Reputation: 34062

Iterating through child li nodes in xpath

I have the following HTML:

$page = '<html>
<head>
<title>Page</title>
</head>

<body>

<div>
    <div>
        <div>
        </div>
        <div class="this one">
            <h2>Ignore</h2>
            <p>Text</p>

            <h2>Header 1</h2>
            <ul><li>List Value 1</li></ul>

            <h2>Header 2</h2>
            <ul><li>List Value 2</li></ul>

            <h2>Ignore</h2>
            <ul><li>List Value 3</li></ul>

            <h2>Header 3</h2>
            <ul>
                <li>List Value A</li>
                <li>List Value B</li>
                <li>List Value C</li>
            </ul>

            <h2>Ignore</h2>
            <p>Text</p>
        </div>
    </div>
</div>

</body>
</html>';

I am trying to get the li list for Header 3 only and the following code doesn't work;

$doc->loadHTML($page);   
$xpath = new DomXPath($doc);

$nodes = $xpath->query("//div[@class='this one']/h2[.='Header 3']/ul/li");
foreach($nodes as $node) { 
    echo $node->nodeValue . "<br />";
}

I am expecting the output:

List Value A<br />
List Value B<br />
List Value C<br />

Upvotes: 1

Views: 251

Answers (1)

Sean Bright
Sean Bright

Reputation: 120714

This is the expression that you want:

//div[@class = 'this one']/h2[text() = 'Header 3']/following-sibling::ul[1]/li

Broken down a bit:

  • //div[@class = 'this one'] - Match all <div>s in the document with the specified class attribute value

  • …/h2[text() = 'Header 3'] - Match all <h2>s that are children of those <div>s that have the specified text content

  • …/following-sibling::ul - Use the following-sibling axis to match <ul>s that appear after the <h2>s

  • …[1] - Match only the first <ul> that is a sibling of the matched <h2> (… remembering that indexes are 1-based in XPath expressions)

  • …/li - And match all of the <li>s which are children of that <ul>

Upvotes: 3

Related Questions