GenesisBits
GenesisBits

Reputation: 346

Wrap a H3 tag and all UL tags under it in a div

I have a structure that goes like this:

<h3><span class="header" id="first_set">My Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<h3><span class="header" id="second_set">My Second Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<h3><span class="header" id="third_set">My Third Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>

I extracted this from a web-page using DOMDocument. I need to iterate through 9000 pages which all have slight variations in them. So the "Third Heading" might in fact be a table in some instances instead of another h3.

What I am trying to do accurately is wrap a div around the second heading and closing the div when it finds no more </ul> tags (so until it hits anything that's not a ul tag). So the result would be something like this:

<h3><span class="header" id="first_set">My Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<div class="second_heading">
<h3><span class="header" id="second_set">My Second Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
</div>
<h3><span class="header" id="third_set">My Third Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>

I'm thinking preg_replace but not sure how to do the logic of "close div when last closing ul tag is found".

Upvotes: 0

Views: 466

Answers (1)

Nick
Nick

Reputation: 147166

You can achieve this while still working with your DOMDocument. I'm assuming you have a variable called $node which is the node above the HTML you show in your question. In that case, you can find all the child nodes of that element using DOMXPath, then iterate through them until you get to the second <h3> and append that and all subsequent <ul> elements to a new <div> until you get to the first non <ul> element after the second header:

$div = $doc->createElement('div');
$xpath = new DOMXPath($doc);
$headers = 0;
foreach ($xpath->query('./*', $node) as $child) {
    echo $child->nodeName;
    switch ($child->nodeName) {
        case 'h3':
            $headers++;
            if ($headers == 2) {
                $node->replaceChild($div, $child);
                $div->appendChild($child);
            }
            else if ($headers == 3) {
                break 2;
            }
            break;
        case 'ul':
            if ($headers == 2) $div->appendChild($child);
            break;
        default:
            // if a non-ul element after the 2nd header, exit the loop
            if ($headers == 2) break 2;
            break;
    }
}

Demo on 3v4l.org

Upvotes: 1

Related Questions