Nitesh morajkar
Nitesh morajkar

Reputation: 431

Get entire BODY content using PHP DOM DOCUMENT

I want to get entire body tag content using DOM Document.

I used following code:

$dom = new domDocument;

/*** load the html into the object ***/
$dom->loadHTML($html);

/*** the table by its tag name ***/
$tables = $dom->getElementsByTagName('body')->item(0)->nodeValue;

This gives me TExt. I want entire body content.

Upvotes: 14

Views: 27489

Answers (3)

Spooky
Spooky

Reputation: 1316

$dom = new domDocument;
$dom->loadHTML($html);

// ... change, replace ...
// ... mock, traverse ..

$body = $dom->documentElement->lastChild;
$dom->saveHTML($body);

Upvotes: 5

lubosdz
lubosdz

Reputation: 4500

It is safer to use PHP tidy extension which can fix invalid XHTML structures and also extract body only:

$tidy = new tidy();
$htmlBody = $tidy->repairString($html, array(
    'output-xhtml' => true,
    'show-body-only' => true,
), 'utf8');

Then load extracted body into DOMDocument:

$xml = new DOMDocument();
$xml->loadHTML($htmlBody);

Upvotes: 4

VolkerK
VolkerK

Reputation: 96159

You can pass the body DOMElement to either DOMDocument::saveHTML() or DOMDocument::saveHTMLFile(), e.g.

<?php
$doc = new DOMDocument;
$doc->loadhtmlfile('http://stackoverflow.com');

$body = $doc->getElementsByTagName('body');
if ( $body && 0<$body->length ) {
    $body = $body->item(0);
    echo $doc->savehtml($body);
}

prints

Warning: DOMDocument::loadHTMLFile(): Unexpected end tag : p in http://stackoverflow.com, line: 2843 [...]
<body class="home-page">
<noscript><div id="noscript-padding"></div></noscript>
<div id="notify-container"></div>
<div id="overlay-header"></div>
<div id="custom-header"></div>
<div class="container">
        <div id="header">
            <div id="portalLink">
[...]

Upvotes: 16

Related Questions