vasmay
vasmay

Reputation: 1429

Remove style tags, CSS, scripts and HTML tags from HTML to plain text

Using regular expressions, how do I remove style tags, CSS, scripts and HTML tags from HTML to plain text.

In ASP.NET C#.

Upvotes: 1

Views: 2666

Answers (1)

Doug
Doug

Reputation: 6518

I don't think you are looking for a regex to do this, however the following regex should do it, if you run a regex replace:

<[^>]*>

To use this in a Regex Replace to the following:

string myHtmlString = "<html><body>my test text</body></html>";

string myPlainTextString = Regex.Replace(myHtmlString ,"<[^>]*>",String.Empty);

I recommend you use something like the Html Agility pack though - http://htmlagilitypack.codeplex.com/

as it has a method to make this even easier called "ConvertToPlainText":

string myHtmlString = "<html><body>my test text</body></html>";

string myPlainTextString = ConvertToPlainText(myHtmlString);

Upvotes: 1

Related Questions