Riccardo
Riccardo

Reputation: 2226

DOMDocument to extract part of a webpage (Any encoding)?

What's the code to store in a string the whole webpage's content between <body></body> tags?

I've heard about DOMDocument but I'm a big rookie, some code sample would help!

Upvotes: 0

Views: 1289

Answers (2)

Riccardo
Riccardo

Reputation: 2226

Found this solves the problem!

Upvotes: 0

Artefacto
Artefacto

Reputation: 97835

$d = new DOMDOcument();
libxml_use_internal_errors(true);
$d->loadHTMLFile("http://stackoverflow.com");
$b = $d->getElementsByTagName("body")->item(0);
if ($b !== null) {
    echo simplexml_import_dom($b)->asXML();
}

This will also include the <body> tag, and the content will have been modified to be well-formed XML.

To have no body tags (though now we don't have a single root, thus not well-formed XML):

$d = new DOMDOcument();
libxml_use_internal_errors(true);
$d->loadHTMLFile("http://stackoverflow.com");
$b = $d->getElementsByTagName("body")->item(0);
if ($b !== null) {
    for ($n = $b->firstChild; $n !== null; $n = $n->nextSibling) {
        echo simplexml_import_dom($n)->asXML();
    }
}

Upvotes: 1

Related Questions