Kokos
Kokos

Reputation: 9121

Remove html linebreaks between <ul> tags

I have a CMS system that allows people to also use HTML code, but a nl2br is provided at the end of the function, which makes this:

<ul>
<li></li>
</ul>

into this:

<ul><br/>
<li></li><br/>
</ul>

Now I want to remove these <br/>'s that exist between <ul> tags.

I already found another question which asks almost the same, but for newlines. I've integrated this inside my CMS but for one client all the content is already filled in so I have to fix this after the \n's are replaced with <br/>'s.

The other question provides this as a regex to match \n within <ul></ul>:

/(?<=<ul>|<\/li>)\s*?(?=<\/ul>|<li>)/is

I'd think something like this:

/(?<=<ul>|<\/li>)(<br>|<br\/>|<br \/>)(?=<\/ul>|<li>)/is

Would do the trick, but it doesn't. What am I missing?

EDIT

I am very open for DOMDocument solutions, if there's a way to query linebreaks with xpath this would probably fix my problem.

Upvotes: 2

Views: 1568

Answers (2)

Karolis
Karolis

Reputation: 9562

In the example you provided, <br> tags are surrounded by some white-space (at least by new line characters), so this needs to be reflected in the corresponding regular expression.

/(?<=<ul>|<\/li>)(\s*<br>\s*|\s*<br\/>\s*|\s*<br \/>\s*)(?=<\/ul>|<li>)/is 

In many cases regular expressions are NOT the best way for parsing HTML (I definitely agree with the comments above/below), but they are always good enough for some particular purposes.

Upvotes: 2

George Cummins
George Cummins

Reputation: 28906

There are plenty of examples on SO that demonstrate why parsing HTML with regular expressions is a bad idea, so I won't include another one here.

Instead, consider using an HTML parser such as HTMLCleaner or HTML Agility Pack to accomplish this task.

Upvotes: 0

Related Questions