Match all newlines between two tags

Question

In a string that represents html markup, I need to remove all newlines that are between any

. Here is an example string:


element 1

element 2

Hello there

.

So all the inside the

need to be removed.

I've tried the following but it doesn't seem to be working correctly:

https://regex101.com/r/qLxSys/1

/

.*?(
)?.*?

/

Can anybody please help me understand how I'd accomplish my goal?

mquantin · Accepted Answer

To match newline between

(?<=).*?(
).*(?=
)

Group 1 only matches one character inside

<\ul>

In Python3:

#!python3
import re
string = "
element 1

element 2


Hello there"
pattern = re.compile(r'(?<=)(.*?)(
)(.*)(?=)(?su)')
while pattern.search(string):
    string = pattern.sub(r'\g<1>'+r'\g<3>', string)
print(string)

In the above example, the last is not replaced because it is not between

Another cleaner solution is to use regex to match ' ' characters after using a html parser (eg. beautifulsoup in python) to get only the

Match all newlines between two tags

Answers (1)

Related Questions