Reputation: 1969
I have a dynamic string, that may contain h2 tags and in those h2 tags some br tags. I want to remove those br tags from the string.
<h2>Headline 1</h2>Lorem ipsum dolor sit amet, consetetur sadipscing elitr.<h2>Headline 2 <br /><br /></h2>Lorem ipsum dolor sit amet, consetetur sadipscing elitr<h2>Headline 2<br /><br /></h2>Lorem ipsum dolor sit amet, consetetur sadipscing elitr<h2>Headline 2</h2>Lorem ipsum dolor sit amet, consetetur sadipscing elitr
To remove the br tags, I use this regex:
/<h2.*?>.+?(<br[\s+]?\/>).+?<\/h2>/
The problem is, that my first match is <h2>Headline 1</h2>Lorem ipsum dolor sit amet, consetetur sadipscing elitr.<h2>Headline 2 <br /><br /></h2>
.
Yes, works as designed :-) But how can I make regex only capture the groups with a br in the h2 tags?
Upvotes: 0
Views: 107
Reputation: 91385
I suggest you to use a DOM parser.
But, if you really want to use regex, that is acceptable in this case, you can use preg_replace_callback:
$html = '<h2>Headline 1</h2>Lorem ipsum.<h2>Headline 2 <br /><br /></h2>dolor sit amet,<h2>Headline 2<br /><br /></h2>consetetur<br /> sadipscing elitr<h2>Headline 2</h2>Lorem<br /> ipsum';
# first, extract the string inside <h2>...</h>
$res = preg_replace_callback('~<h2>\K.*?(?=</h2>)~',
function($m) {
# then remove the <br />
return preg_replace('~<br />~', '', $m[0]);
},
$html);
echo $res;
Output:
<h2>Headline 1</h2>Lorem ipsum.<h2>Headline 2 </h2>dolor sit amet,<h2>Headline 2</h2>consetetur<br /> sadipscing elitr<h2>Headline 2</h2>Lorem<br /> ipsum
Upvotes: 1
Reputation: 2201
It might be much easier to do it in more than 1 step:
<h2>...</h2>
sequence<br />
tags from the <h2>...</h2>
sequenceAlternatively, search for:
(<\s*h2[^<]*>[^<]*)<\s*br\s*\/\s*>
and replace with:
\1
Repeat until no more replacements are done.
Test here.
The other solution (smarter) is to use a proper HTML parser and do all the magic you want.
Upvotes: 1