KJW
KJW

Reputation: 15251

xpath: select text nodes before and after break tags

considering the following : (mixture of <br> and <br/>)

text1
<br>
text2
<br/>
text3
<br/>
text4
<br>
text5

How can I locate each text nodes ?

I am thinking something that fits the condition of preceding OR following a br tag....but unsure if <br> and <br/> are treated differently in xpath.

Upvotes: 2

Views: 6491

Answers (2)

DOMDocument's loadHtml() method works well with invalid HTML fragments, so you can use DOMXPath this way:

<?php

$html = 'text1
<br>
text2
<br/>
text3
<br/>
text4
<br>
text5';

echo "<pre>" . htmlentities($html) . "</pre><br>\n";

$dom = new DOMDocument();
// loadHtml() needs mb_convert_encoding() to work well with UTF-8 encoding
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));

$xpath = new DOMXPath($dom);

echo "Text nodes preceding br:";
foreach($xpath->query('//text()[(following::br)]') as $node)
{
    var_dump($node->wholeText);
}

echo "Text nodes following br:";
foreach($xpath->query('//text()[(preceding::br)]') as $node)
{
    var_dump($node->wholeText);
}

echo "Text nodes following OR preceding br:";
foreach($xpath->query('//text()[(following::br) or (preceding::br)]') as $node)
{
    var_dump($node->wholeText);
}

Upvotes: 5

MrJoel
MrJoel

Reputation: 42

Your example is not valid XML against which an XPath query can be run - neither of the
elements are ever closed.

However, generally to select that you would use the node type predicate, something like //br/text()

Upvotes: -1

Related Questions