user1513171
user1513171

Reputation: 1974

PHP preg_match - matching html elements

Ok so I have a regular expression I'm trying to use to match a certain pattern in some html files. Here's the preg_match statement:

preg_match('@<'.$htmlElementType.' id\s*=\s*"{{ALViewElement_'.$this->_elementId.'}}".*>[\s\S]*</'.$htmlElementType.'(>)@i', $htmlString, $newMatches, PREG_OFFSET_CAPTURE)

To be clear, this is attempting to match an html element with an id of {{ALViewElement_.*}} but it also needs to end itself with a closing tag, for example if $htmlElementType was "section" it would end in "/section>".

If my html looked just like this with nothing else in it, it works as expected:

<section id="{{ALViewElement_resume}}">
            <!--{{RESUME_ADD_CHANGE_PIECE}}-->
            <!--{{RESUME}}-->
        </section>

The problem is when we have a section element later in the html and it ALSO has a closing /section>. Example:

<section id="{{ALViewElement_resume}}">
            <!--{{RESUME_ADD_CHANGE_PIECE}}-->
            <!--{{RESUME}}-->
        </section>
        <div>

        </div>
        <section>
            HEY THIS IS ME
        </section>

In this case the full mach is everything above. But I want it to stop at the that opens my first one. This is important because later on in my code I need the location of the last > in that ending tag.

Any ideas how I could change this regular expression a little bit?

Thanks for the help!

Upvotes: 1

Views: 1336

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

Yes, just use an ungreedy quantifier:

preg_match('@<'.$htmlElementType.' id\s*=\s*"{{ALViewElement_'.$this->_elementId.'}}".*?>[\s\S]*?</'.$htmlElementType.'(>)@i', $htmlString, $newMatches, PREG_OFFSET_CAPTURE)

another way: with DOMDocument:

$html = <<<LOD
<section id="{{ALViewElement_resume}}">
        <!--{{RESUME_ADD_CHANGE_PIECE}}-->
        <!--{{RESUME}}-->
</section>
<div>

</div>
<section>
    HEY THIS IS ME
</section>
LOD;
$doc= new DOMDocument();
@$doc->loadHTML($html);
$node = $doc->getElementById("{{ALViewElement_resume}}");

$docv = new DOMDocument();
$docv->appendChild($docv->importNode($node, TRUE));
$result = $docv->saveHTML();
echo htmlspecialchars($result);

Upvotes: 2

Related Questions