Reputation: 346
I have a structure that goes like this:
<h3><span class="header" id="first_set">My Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<h3><span class="header" id="second_set">My Second Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<h3><span class="header" id="third_set">My Third Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
I extracted this from a web-page using DOMDocument. I need to iterate through 9000 pages which all have slight variations in them. So the "Third Heading" might in fact be a table in some instances instead of another h3.
What I am trying to do accurately is wrap a div around the second heading and closing the div when it finds no more </ul>
tags (so until it hits anything that's not a ul tag). So the result would be something like this:
<h3><span class="header" id="first_set">My Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<div class="second_heading">
<h3><span class="header" id="second_set">My Second Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
</div>
<h3><span class="header" id="third_set">My Third Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
I'm thinking preg_replace
but not sure how to do the logic of "close div when last closing ul tag is found".
Upvotes: 0
Views: 466
Reputation: 147166
You can achieve this while still working with your DOMDocument
. I'm assuming you have a variable called $node
which is the node above the HTML you show in your question. In that case, you can find all the child nodes of that element using DOMXPath
, then iterate through them until you get to the second <h3>
and append that and all subsequent <ul>
elements to a new <div>
until you get to the first non <ul>
element after the second header:
$div = $doc->createElement('div');
$xpath = new DOMXPath($doc);
$headers = 0;
foreach ($xpath->query('./*', $node) as $child) {
echo $child->nodeName;
switch ($child->nodeName) {
case 'h3':
$headers++;
if ($headers == 2) {
$node->replaceChild($div, $child);
$div->appendChild($child);
}
else if ($headers == 3) {
break 2;
}
break;
case 'ul':
if ($headers == 2) $div->appendChild($child);
break;
default:
// if a non-ul element after the 2nd header, exit the loop
if ($headers == 2) break 2;
break;
}
}
Upvotes: 1