Reputation: 5383
I'm looking way to clear/remove all HTML tags they have nothing...
For example:
<p></p><div> to make links</div><b> </b>
<a href="http://foo.com"></a><p> for linebreak add 2 spaces at end
</p><strong></strong><i></i>
To:
<div> to make links</div><p> for linebreak add 2 spaces at end</p>
//I'm sure it is not dublicate.
Upvotes: 0
Views: 3698
Reputation: 30985
You could use a regex like this:
<(\w+)\s*.*?>\s*?</\1>
The idea is to look for tags (with or without attributes) that contains empty values. For the sample input you added, the output is:
<div> to make links</div>
<p> for linebreak add 2 spaces at end
</p>
Upvotes: 1
Reputation: 5383
public static string RemoveUnusedTags(this string source)
{
return Regex.Replace(source, @"<(\w+)\b(?:\s+[\w\-.:]+(?:\s*=\s*(?:""[^""]*""|'[^']*'|[\w\-.:]+))?)*\s*/?>\s*</\1\s*>", string.Empty, RegexOptions.Multiline);
}
Upvotes: 1
Reputation: 155045
Using this QA as a starting point ( Regular expression to match empty HTML tags that may contain embedded JSTL? ), we have the regex <(\w+)(?:\s+\w+="[^"]+(?:"\$[^"]+"[^"]+)?")*>\s*</\1>
.
Then it's just a matter of feeding this into .NET's Regex engine:
Regex r = new Regex(@"<(\w+)(?:\s+\w+=""[^""]+(?:""\$[^""]+"[^""]+)?"")*>\s*</\1>");
String output = r.Replace( inputString, String.Empty );
This regular expression will match any text of the form <foo bar="baz"> </foo>
where attributes are entirely optional, and there may only be whitespace between the opening and closing tags.
Upvotes: 0