Reputation: 1974
Ok so I have a regular expression I'm trying to use to match a certain pattern in some html files. Here's the preg_match statement:
preg_match('@<'.$htmlElementType.' id\s*=\s*"{{ALViewElement_'.$this->_elementId.'}}".*>[\s\S]*</'.$htmlElementType.'(>)@i', $htmlString, $newMatches, PREG_OFFSET_CAPTURE)
To be clear, this is attempting to match an html element with an id of {{ALViewElement_.*}} but it also needs to end itself with a closing tag, for example if $htmlElementType was "section" it would end in "/section>".
If my html looked just like this with nothing else in it, it works as expected:
<section id="{{ALViewElement_resume}}">
<!--{{RESUME_ADD_CHANGE_PIECE}}-->
<!--{{RESUME}}-->
</section>
The problem is when we have a section element later in the html and it ALSO has a closing /section>. Example:
<section id="{{ALViewElement_resume}}">
<!--{{RESUME_ADD_CHANGE_PIECE}}-->
<!--{{RESUME}}-->
</section>
<div>
</div>
<section>
HEY THIS IS ME
</section>
In this case the full mach is everything above. But I want it to stop at the that opens my first one. This is important because later on in my code I need the location of the last > in that ending tag.
Any ideas how I could change this regular expression a little bit?
Thanks for the help!
Upvotes: 1
Views: 1336
Reputation: 89547
Yes, just use an ungreedy quantifier:
preg_match('@<'.$htmlElementType.' id\s*=\s*"{{ALViewElement_'.$this->_elementId.'}}".*?>[\s\S]*?</'.$htmlElementType.'(>)@i', $htmlString, $newMatches, PREG_OFFSET_CAPTURE)
another way: with DOMDocument:
$html = <<<LOD
<section id="{{ALViewElement_resume}}">
<!--{{RESUME_ADD_CHANGE_PIECE}}-->
<!--{{RESUME}}-->
</section>
<div>
</div>
<section>
HEY THIS IS ME
</section>
LOD;
$doc= new DOMDocument();
@$doc->loadHTML($html);
$node = $doc->getElementById("{{ALViewElement_resume}}");
$docv = new DOMDocument();
$docv->appendChild($docv->importNode($node, TRUE));
$result = $docv->saveHTML();
echo htmlspecialchars($result);
Upvotes: 2