Reputation: 1791
I have the following html code:
$pageHTML = '<html>
<head></head>
<body>
<div class="some class">
<header>Header</header>
<section>Section</section>
<footer>Footer</footer>
</div>
</body>
</html>';
and I need to remove outer tags of the <div>
keeping all its inner HTML inside of the <body>
If I try
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($pageHTML);
libxml_use_internal_errors(false);
$bodyDivs = [];
foreach($dom->getElementsByTagName('body')[0]->childNodes as $bodyChild) {
if($bodyChild->nodeName == 'div') {
$bodyDivs[] = $bodyChild;
}
}
if(count($bodyDivs) == 1) {
foreach($bodyDivs[0]->childNodes as $divChild) {
$dom->getElementsByTagName('body')[0]->appendChild($divChild);
}
$dom->getElementsByTagName('body')[0]->removeChild($bodyDivs[0]);
}
the div is being removed but without appending its childs to <body>
before the removing
If I try a reverse loop like
$k = count($bodyDivs[0]->childNodes);
for($n = $k-1; $n >= 0; $n--) {
$dom->getElementsByTagName('body')[0]->appendChild($bodyDivs[0]->childNodes[$n]);
}
$dom->getElementsByTagName('body')[0]->removeChild($bodyDivs[0]);
the childs are being added to the body, but in reverse order
So I get
<body>
<footer>Footer</footer>
<section>Section</section>
<header>Header</header>
</body>
but I need
<body>
<header>Header</header>
<section>Section</section>
<footer>Footer</footer>
</body>
How to resolve the problem?
Upvotes: 0
Views: 509
Reputation: 51950
Your original code is very close, just missing one key point.
Original code
foreach($bodyDivs[0]->childNodes as $divChild) {
$dom->getElementsByTagName('body')[0]->appendChild($divChild);
}
Trying to foreach
a list of nodes, while also removing nodes from that same list (in your case, moving them to the <body>
), does not behave as you intended.
Simplified, complete example for demonstration purposes:
<?php
$doc = new DOMDocument;
$doc->loadXML('<example><a/><b/><c/><d/><e/></example>');
$parent = $doc->documentElement;
foreach ($parent->childNodes as $child) {
$parent->removeChild($child);
}
echo $doc->saveXML();
This outputs the following:
<?xml version="1.0"?>
<example><b/><c/><d/><e/></example>
Totally sensible, right?! Fear not, we can do better.
What to do?
A common approach, that does behave as intended, is to loop over the list until it is empty.
<?php
$doc = new DOMDocument;
$doc->loadXML('<example><a/><b/><c/><d/><e/></example>');
$parent = $doc->documentElement;
while ($parent->childNodes->length > 0) {
$child = $parent->childNodes->item(0);
$parent->removeChild($child);
}
echo $doc->saveXML();
Applied to your code
All of the above means that your original foreach
:
foreach($bodyDivs[0]->childNodes as $divChild) {
$dom->getElementsByTagName('body')[0]->appendChild($divChild);
}
Can be replaced with a while loop.
while ($bodyDivs[0]->childNodes->length > 0) {
$divChild = $bodyDivs[0]->childNodes->item(0);
$dom->getElementsByTagName('body')->item(0)->appendChild($divChild);
}
Aside: I used the ->item(0)
notation above, as that's more conventional.
Upvotes: 1
Reputation: 1791
Ok, I've found my own solution but maybe someone will post more elegant:
if(count($bodyDivs) == 1) {
$count = count($bodyDivs[0]->childNodes);
$arr = [];
for($n = $count-1; $n >= 0; $n--) {
$arr[] = $bodyDivs[0]->childNodes[$n];
}
for($n = $count-1; $n >= 0; $n--) {
$dom->getElementsByTagName('body')[0]->appendChild($arr[$n]);
}
$dom->getElementsByTagName('body')[0]->removeChild($bodyDivs[0]);
}
echo str_replace("\n\r", "", $dom->saveHTML((new \DOMXPath($dom))->query('/')->item(0)));
Upvotes: 0