Reputation: 6531
I currently have an extension method from removing any HTML from strings.
Regex.Replace(s, @"<(.|\n)*?>", string.Empty);
This works fine on the whole, however, I am occasionally getting passed strings that have both standard HTML markup within them, along with encoded markup (I don't have control of the source data so can't correct things at the point of entry), e.g.
<p><p>Sample text</p></p>
I need an expression that will remove both encoded and non-encoded HTML (whether it be paragraph tags, anchor tags, formatting tags etc.) from a string.
Upvotes: 2
Views: 4731
Reputation: 18430
I think you can do that in two passes with your same Extension method.
First Replace the usual un-encoded tags then Decode the returned string and do it again. Simple
Upvotes: 5