shakin
shakin

Reputation: 171

VB.Net or C# to strip html but leave less than or greater than

I have a string variable that contains the following html data:

<p> <em><strong>This is some <span style="background-color: rgb(255, 255, 0);">rich </span>text. 3 < 5 is a valid statement. <br /> </strong></em></p>

I need to be able to strip out the html, but leave any less than or greater than signs in case the data contains mathematical equations (like the "3 < 5" portion of the string). I am not able to use 3rd party applications/tools due to some restrictions of our site, and would prefer to use anything that is in the .net framework version 3.5. I have tried the regular expressions that follow, but they do not handle the less than/ greater than symbols.

<[^>]*>
<[^>]+>
<(.|\n)*?>
\<[^\>]*\>

I have also tried the code on this link, but it also does not handle the less than / greater than symbols either.

Any suggestions are greatly appreciated.

Upvotes: 1

Views: 1596

Answers (1)

David Sulc
David Sulc

Reputation: 25994

Replace all text matching this with ''

(<[^\s]+[^<>]*>)+

(I tested it on Rubular.com, but it should work for C# too.)

Apparently the code should be

RegexObj.Replace('<p> <em><strong>This is some <span style="background-color: rgb(255, 255, 0);">rich </span>text. 3 < 5 is a valid statement. <br /> </strong></em></p>', "")

Upvotes: 3

Related Questions