Reputation: 2891
I have a small bit of regex that strips out all HTML, and works great. What I need to do now, is strip out all HTML, but KEEP the <b>
and <strong>
tags in tact.
Any help would be greatly appreciated.
shortDesc = System.Text.RegularExpressions.Regex.Replace(shortDesc, @"<[^>]*>", String.Empty);
Thanks!
Upvotes: 1
Views: 1665
Reputation: 34385
Here is a simple extension of your regex that should work pretty well:
Regex re = new Regex(@"<(?!/?(?:strong|b)\b)[^>]*>",
RegexOptions.IgnoreCase);
text = re.Replace(text, "");
Upvotes: 2
Reputation: 713
From what I gathered in your comments, a careful usage of regular expressions (though usually shunned for obvious reasons) could be employed, provided that you meet the following requirement:
If the html page is under your control, it is fairly reasonable to assume that you could meet both conditions, otherwise I wouldn't bother.
In your case, you can use the overloaded instance of the Replace method that accepts a MatchEvaluator delegate along with its other parameters.
Usage example:
MatchEvaluator replaceCallback = new MatchEvaluator(MatchHandler);
Regex RE = new Regex(matchPattern, RegexOptions.Multiline);
string newString = RE.Replace(source, replaceCallback);
MatchHandler example:
public static string MatchHandler(Match theMatch) {
if (theMatch.Value.StartsWith("<b>") || ...) {
return theMatch.Value; //return as is
}
//else return empty string
return "";
}
Upvotes: 0
Reputation: 943143
Upvotes: 4