user90501
user90501

Reputation: 97

Remove useless paragraph tags from string

If I have a string like:

<p>&nbsp;</p>
<p></p>
<p class="a"><br /></p>
<p class="b">&nbsp;</p>
<p>blah blah blah this is some real content</p>
<p>&nbsp;</p>
<p></p>
<p class="a"><br /></p>

How can I turn it into just:

<p>blah blah blah this is some real content</p>

The regex needs to pick up &nbsp;s and spaces.

Upvotes: 4

Views: 15318

Answers (3)

Peter Boughton
Peter Boughton

Reputation: 112190

This regex will work against your example:

<p[^>]*>(?:\s+|(?:&nbsp;)+|(?:<br\s*/?>)+)*</p>

Upvotes: 5

lsl
lsl

Reputation: 4419

As the original replier stated, regex isn't the best solution here, what you want is some sort of html stripper.

A function on this site: http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page

Should help you out, you just need to use a bit of string manipulation to get the new lines and what not back to the format you want.

Upvotes: 1

Edward Z. Yang
Edward Z. Yang

Reputation: 26742

$result = preg_replace('#<p[^>]*>(\s|&nbsp;?)*</p>#', '', $input);

This doesn't catch literal nbsp characters in the output, but that's very rare to see.

Since you're dealing with HTML, if this is user-input I might suggest using HTML Purifier, which will also deal with XSS vulnerabilities. The configuration setting you want there to remove empty p tags is %AutoFormat.RemoveEmpty.

Upvotes: 14

Related Questions