Reputation: 201
I'm attempting to get the DOM elements of external pages. Based on other posts I'm trying:
$html = htmlentities(file_get_contents('http://www.slate.com'));
$dom = new domDocument;
$dom->loadHTML($html);
echo "<pre>";
var_dump($dom);
echo "</pre>";
(Html entites kills warnings, but otherwise has the same result as leaving it out).
Based on what I've read, this should return various DOM parts in parent/child nodes. But the result of the code above contains no DOM nodes, just a huge "textContent" element that contains the entire page HTML.
Thanks in advance for thoughts on what I'm doing wrong.
Upvotes: 1
Views: 33
Reputation: 1277
You are looking for
$dom->documentElement
this will return a
DOMNode
object.
Also: Get rid of the htmlentities
because this will mess up the HTML code you fetch. e.g.: <
will get <
, which your loadHTML
won't interpret as a <
. Take a look at: Disable warnings when loading non-well-formed HTML by DomDocument (PHP)
Dummy-Dump:
function dump(DOMNode $node)
{
echo $node->nodeName;
if ($node->hasChildNodes())
{
echo '<div style="margin-left:20px; border-left:1px solid black; padding-left: 5px;">';
foreach ($node->childNodes as $childNode)
{
dump($childNode);
}
echo '</div>';
}
}
dump($dom->documentElement);
Which looks like:
Upvotes: 1
Reputation: 16
You should consider using phpQuery (https://github.com/electrolinux/phpquery).
Upvotes: 0