Nubcake
Nubcake

Reputation: 469

PHP DOM parsing text between <hr> tags

I am trying to parse some HTML to get the text between two <hr> tags using DOM with PHP but I don't get any output when I pass in hr into getElementsByTagName:

<?php 
    $dom = new DOMDocument();
    $dom->loadHTML("<hr>Text<hr>");
    $hr = $dom->getElementsByTagName("hr");
    for ($i=0; $i<$hr->length; $i++) {
        echo "[". $i . "]" . $hr->item($i)->nodeValue . "</br>";
    }
?>

When I run this code, it doesn't output anything however, if I change "hr" to "*" then it outputs:

[0]Text
[1]Text
[2]
[3]

(Why four lines of results?)

I run this code on a webserver which has PHP version 7.1.3 running. I can't use functions such as file_get_html or str_get_html because it returns an error about Undefined call to function ...

Why doesn't the hr tag produce results?

Upvotes: 2

Views: 576

Answers (2)

miken32
miken32

Reputation: 42716

Perhaps what you're looking for is the contents of the text node between two <hr> elements? In that case we go looking for siblings with an XPath expression:

<?php
$dom = new DOMDocument();
$dom->loadHTML("Some text<hr>The text<hr>Other text");
$xp = new DomXPath($dom);
$result = $xp->query("//text()[(preceding-sibling::hr and following-sibling::hr)]");
foreach ($result as $i=>$node) {
    echo "[$i]$node->textContent<br/>\n";
}

Upvotes: 4

Kyoya
Kyoya

Reputation: 343

This happens, because the <hr> has no child nodes (text are also childs). To get the text between the <hr> nodes, you have to iterate over all nodes on the same level and check if the current node is a text node (nodeType == 3), the previous sibling must be a HR node and the next sibling must be a HR node too.

<?php 
    $dom = new DOMDocument();
    $dom->loadHTML("<hr>Text<hr>");

    foreach ($dom->childNodes as $childNode) {
        if (3 !== $childNode->nodeType) {
            continue;
        }

        if (!$childNode->previousSibling || ('HR' !== $childNode->previousSibling->nodeName)) {
            continue;
        }

        if (!$childNode->nextSibling || ('HR' !== $childNode->nextSibling->nodeName)) {
            continue;
        }

        echo "{$childNode->nodeValue}\n";
    }

But if you want to get anything between the hr nodes it will be more complicated.

Upvotes: 3

Related Questions