Erçin Dedeoğlu
Erçin Dedeoğlu

Reputation: 5383

Remove Unused (Empty) HTML Tags

I'm looking way to clear/remove all HTML tags they have nothing...

For example:

<p></p><div> to make links</div><b> </b>
<a href="http://foo.com"></a><p> for linebreak add 2 spaces at end
</p><strong></strong><i></i>

To:

<div> to make links</div><p> for linebreak add 2 spaces at end</p>

//I'm sure it is not dublicate.

Upvotes: 0

Views: 3698

Answers (3)

Federico Piazza
Federico Piazza

Reputation: 30985

You could use a regex like this:

<(\w+)\s*.*?>\s*?</\1>

Working demo

enter image description here

The idea is to look for tags (with or without attributes) that contains empty values. For the sample input you added, the output is:

<div> to make links</div>
<p> for linebreak add 2 spaces at end
</p>

Upvotes: 1

Er&#231;in Dedeoğlu
Er&#231;in Dedeoğlu

Reputation: 5383

public static string RemoveUnusedTags(this string source)
{
    return Regex.Replace(source, @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:""[^""]*""|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>", string.Empty, RegexOptions.Multiline);
}

Upvotes: 1

Dai
Dai

Reputation: 155045

Using this QA as a starting point ( Regular expression to match empty HTML tags that may contain embedded JSTL? ), we have the regex <(\w+)(?:\s+\w+="[^"]+(?:"\$[^"]+"[^"]+)?")*>\s*</\1>.

Then it's just a matter of feeding this into .NET's Regex engine:

Regex r = new Regex(@"<(\w+)(?:\s+\w+=""[^""]+(?:""\$[^""]+"[^""]+)?"")*>\s*</\1>");
String output = r.Replace( inputString, String.Empty );

This regular expression will match any text of the form <foo bar="baz"> </foo> where attributes are entirely optional, and there may only be whitespace between the opening and closing tags.

Upvotes: 0

Related Questions