RegEx: remove double
tags

Question

I have a dynamic string, that may contain h2 tags and in those h2 tags some br tags. I want to remove those br tags from the string.

Headline 1
Lorem ipsum dolor sit amet, consetetur sadipscing elitr.Headline 2 

Lorem ipsum dolor sit amet, consetetur sadipscing elitrHeadline 2

Lorem ipsum dolor sit amet, consetetur sadipscing elitrHeadline 2Lorem ipsum dolor sit amet, consetetur sadipscing elitr

To remove the br tags, I use this regex:

/.+?().+?<\/h2>/

The problem is, that my first match is

`Headline 1`

Lorem ipsum dolor sit amet, consetetur sadipscing elitr.Headline 2

. Yes, works as designed :-) But how can I make regex only capture the groups with a br in the h2 tags?

virolino · Accepted Answer

It might be much easier to do it in more than 1 step:

Use regex to extract the ... sequence
Use regex to remove the tags from the ... sequence
Write the new string
Copy everything else as-is

Alternatively, search for:

(<\s*h2[^<]*>[^<]*)<\s*br\s*\/\s*>

and replace with:

\1

Repeat until no more replacements are done.

Test here.

The other solution (smarter) is to use a proper HTML parser and do all the magic you want.

RegEx: remove double <br /> tags

Answers (2)

Related Questions

RegEx: remove double &lt;br /&gt; tags

Answers (2)

Related Questions

RegEx: remove double <br /> tags