jnbn
jnbn

Reputation: 70

Regexp for cleaning the empty, unnecessary HTML tags

I'm using TinyMCE (WYSIWYG) as the default editor in one of my projects and sometimes it automatically adds <p>&nbsp;</p> , <p> </p> or divs.

I have been searching but I couldn't really find a good way of cleaning any empty tags with regex.

The code I've tried to used is,

$pattern = "/<[^\/>]*>([\s]?)*<\/[^>]*>/";
$str = preg_replace($pattern, '', $str); 

Note: I also want to clear &nbsp too :(

Upvotes: 4

Views: 3379

Answers (5)

Chris Czopp
Chris Czopp

Reputation: 1

Try this:

<([\w]+)[^>]*?>(\s|&nbsp;)*<\/\1>

Upvotes: 0

AppDeveloper
AppDeveloper

Reputation: 937

You would want multiple Regexes to be sure you do not eliminated other wanted elements with one generic one.

As Ben said you may drop valid elements with one generic regex

<\s*[^>]*>\s*`&nbsp;`\s*<\s*[^>]*>
<\s*p\s*>\s*<\s*/p\s*>
<\s*div\s*>\s*<\s*/div\s*>

Upvotes: 0

da5id
da5id

Reputation: 9136

I know it's not directly what you asked for, but after months of TinyMCE, coping with not only this but the hell that results from users posting directly from Word, I have made the switch to FCKeditor and couldn't be happier.

EDIT: Just in case it's not clear, what I'm saying is that FCKeditor doesn't insert arbitrary paras where it feels like it, plus copes with pasted Word crap out of the box. You may find my previous question to be of help.

Upvotes: 0

user111325
user111325

Reputation:

Try /<(\w+)>(\s|&nbsp;)*<\/\1>/ instead. :)

Upvotes: 6

pix0r
pix0r

Reputation: 31280

That regexp is a little odd - but looks like it might work. You could try this instead:

$pattern = ':<[^/>]*>\s*</[^>]*>:';
$str = preg_replace($pattern, '', $str);

Very similar though.

Upvotes: 1

Related Questions