Reputation: 1918

Regex to remove empty html tags, that contains only empty children

I need to parse an HTML string and remove all the elements which contain only empty children.

Example:

<P ALIGN="left"><FONT FACE="Arial" SIZE="12" COLOR="#000000" LETTERSPACING="0" KERNING="1"><B></B></FONT></P>

contains no information and must be replaced with </br>

I wrote a regex like this:

<\w+\b[^>]*>(<\w+\b[^>]*>\s*</\w*\s*>)*</\w*\s*>

but the problem is that it's catching only 2 levels of the three. In the abobe example, the <p> element (the outer-most one) is not selected.

Can you help me fix this regex?

Upvotes: 2

Answers (3)

Hoàng Vũ Tgtt

Reputation: 2032

please try this:

function removeEmtpyElements(str, iterations){
    var re = /<([A-z]+)([^>^/]*)>\s*<\/\1>/gim;
    var subst = '';
    
    for(var i = 0; i < iterations; i++){
        str = str.replace(re, subst);
    }
    
    return str;
}

Upvotes: 0

Bohemian

Reputation: 425053

This regex seems to work:

/(<(?!\/)[^>]+>)+(<\/[^>]+>)+/

See a live demo with your example.

Upvotes: 5

Philipp

Reputation: 403

Use jQuery and parse all children. For each child you have to check if .html() is empty. If yes -> delete the current element (or the parent if you want) with .remove().

Do for each string:

var appended = $('.yourparent').append('YOUR HTML STRING');

appended.children().each(function () 
{
    if(this.html() === '')
    {
        this.parent().remove(); 
    }
});

This will add the items first and delete, if there are empty children.

Upvotes: 2

Regex to remove empty html tags, that contains only empty children

Answers (3)

Related Questions