daprezjer
daprezjer

Reputation: 201

domDocument is not returning node information

I'm attempting to get the DOM elements of external pages. Based on other posts I'm trying:

$html = htmlentities(file_get_contents('http://www.slate.com'));    
$dom = new domDocument;
$dom->loadHTML($html);
echo "<pre>";
var_dump($dom);
echo "</pre>";

(Html entites kills warnings, but otherwise has the same result as leaving it out).

Based on what I've read, this should return various DOM parts in parent/child nodes. But the result of the code above contains no DOM nodes, just a huge "textContent" element that contains the entire page HTML.

Thanks in advance for thoughts on what I'm doing wrong.

Upvotes: 1

Views: 33

Answers (2)

SpazzMarticus
SpazzMarticus

Reputation: 1277

You are looking for

$dom->documentElement

this will return a

DOMNode

object.

Also: Get rid of the htmlentities because this will mess up the HTML code you fetch. e.g.: < will get &lt, which your loadHTML won't interpret as a <. Take a look at: Disable warnings when loading non-well-formed HTML by DomDocument (PHP)

Dummy-Dump:

function dump(DOMNode $node)
{
    echo $node->nodeName;
    if ($node->hasChildNodes())
    {
        echo '<div style="margin-left:20px; border-left:1px solid black; padding-left: 5px;">';
        foreach ($node->childNodes as $childNode)
        {
            dump($childNode);
        }
        echo '</div>';
    }
}

dump($dom->documentElement);

Which looks like:

Dummy-Dump

Upvotes: 1

Łukasz
Łukasz

Reputation: 16

You should consider using phpQuery (https://github.com/electrolinux/phpquery).

Upvotes: 0

Related Questions