Reputation: 1918
I need to parse an HTML string and remove all the elements which contain only empty children.
Example:
<P ALIGN="left"><FONT FACE="Arial" SIZE="12" COLOR="#000000" LETTERSPACING="0" KERNING="1"><B></B></FONT></P>
contains no information and must be replaced with </br>
I wrote a regex like this:
<\w+\b[^>]*>(<\w+\b[^>]*>\s*</\w*\s*>)*</\w*\s*>
but the problem is that it's catching only 2 levels of the three. In the abobe example, the <p>
element (the outer-most one) is not selected.
Can you help me fix this regex?
Upvotes: 2
Views: 5595
Reputation: 2032
please try this:
function removeEmtpyElements(str, iterations){
var re = /<([A-z]+)([^>^/]*)>\s*<\/\1>/gim;
var subst = '';
for(var i = 0; i < iterations; i++){
str = str.replace(re, subst);
}
return str;
}
Upvotes: 0
Reputation: 425053
This regex seems to work:
/(<(?!\/)[^>]+>)+(<\/[^>]+>)+/
See a live demo with your example.
Upvotes: 5
Reputation: 403
Use jQuery and parse all children. For each child you have to check if .html() is empty. If yes -> delete the current element (or the parent if you want) with .remove().
Do for each string:
var appended = $('.yourparent').append('YOUR HTML STRING');
appended.children().each(function ()
{
if(this.html() === '')
{
this.parent().remove();
}
});
This will add the items first and delete, if there are empty children.
Upvotes: 2