Reputation: 469
I am trying to parse some HTML to get the text between two <hr>
tags using DOM with PHP but I don't get any output when I pass in hr
into getElementsByTagName
:
<?php
$dom = new DOMDocument();
$dom->loadHTML("<hr>Text<hr>");
$hr = $dom->getElementsByTagName("hr");
for ($i=0; $i<$hr->length; $i++) {
echo "[". $i . "]" . $hr->item($i)->nodeValue . "</br>";
}
?>
When I run this code, it doesn't output anything however, if I change "hr"
to "*"
then it outputs:
[0]Text
[1]Text
[2]
[3]
(Why four lines of results?)
I run this code on a webserver which has PHP version 7.1.3 running. I can't use functions such as file_get_html
or str_get_html
because it returns an error about Undefined call to function ...
Why doesn't the hr
tag produce results?
Upvotes: 2
Views: 576
Reputation: 42716
Perhaps what you're looking for is the contents of the text node between two <hr>
elements? In that case we go looking for siblings with an XPath expression:
<?php
$dom = new DOMDocument();
$dom->loadHTML("Some text<hr>The text<hr>Other text");
$xp = new DomXPath($dom);
$result = $xp->query("//text()[(preceding-sibling::hr and following-sibling::hr)]");
foreach ($result as $i=>$node) {
echo "[$i]$node->textContent<br/>\n";
}
Upvotes: 4
Reputation: 343
This happens, because the <hr>
has no child nodes (text are also childs).
To get the text between the <hr>
nodes, you have to iterate over all nodes on the same level and check if the current node is a text node (nodeType == 3), the previous sibling must be a HR
node and the next sibling must be a HR
node too.
<?php
$dom = new DOMDocument();
$dom->loadHTML("<hr>Text<hr>");
foreach ($dom->childNodes as $childNode) {
if (3 !== $childNode->nodeType) {
continue;
}
if (!$childNode->previousSibling || ('HR' !== $childNode->previousSibling->nodeName)) {
continue;
}
if (!$childNode->nextSibling || ('HR' !== $childNode->nextSibling->nodeName)) {
continue;
}
echo "{$childNode->nodeValue}\n";
}
But if you want to get anything between the hr
nodes it will be more complicated.
Upvotes: 3