You Old Fool
You Old Fool

Reputation: 22959

How to use PHP DOMDocument saveHTML($node) without added whitespace?

If I use saveHTML() without the optional DOMnode parameter it works as expected:

$html = '<html><body><div>123</div><div>456</div></body></html>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = true;
$dom->formatOutput = false;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD);
echo $dom->saveHTML();
<html><body><div>123</div><div>456</div></body></html>

But when I add a DOMNode parameter to output a subset of the document it seems to ignore the formatOutput property and adds a bunch of unwanted whitespace:

$body = $dom->getElementsByTagName('body')->item(0);
echo $dom->saveHTML($body);
<body>
<div>123</div>
<div>456</div>
</body>

What gives? Is this a bug? Is there a workaround?

Upvotes: 6

Views: 1884

Answers (3)

Rain
Rain

Reputation: 3936

Is this a bug?

Yes, it's a bug and it's reported here

Is there a workaround?

Stick with Nigel's solution for now

Did they fix it?

Yes, as of 7.3.0 alpha3 this is a fixed bug

Check it here

Upvotes: 5

Nigel Ren
Nigel Ren

Reputation: 57131

If you know your document is going to be valid XML as well, you can use saveXML() instead...

$html = '<html><body><div>123</div><div>456</div></body></html>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = true;
$dom->formatOutput = false;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD);
$body = $dom->getElementsByTagName('body')->item(0);
echo $dom->saveXML($body);

which gives...

<body><div>123</div><div>456</div></body>

Upvotes: 4

Patrick Q
Patrick Q

Reputation: 6393

Well, it's a pretty ugly workaround, but it gets the job done:

$html = '<html><body><div>123</div><div>456</div></body></html>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = true;
$dom->formatOutput = false;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD);
$dom->loadHTML(str_replace("\n", "", $dom->saveHTML($dom->getElementsByTagName('body')->item(0))), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

echo $dom->saveHTML();

DEMO

Since saveHTML() returns the string, pass the Node to that, then replace the line breaks, then pass that to loadHTML().

Upvotes: 2

Related Questions