Luke
Luke

Reputation: 1814

How to parse the text out of html in c#

I have an html expression like this:

 "This is <h4>Some</h4> Text" + Environment.NewLine +
 "This is some more <h5>text</h5>

And I want only to extract the text. So the result should be

"This is Some Text" + Environment.NewLine +
 "This is some more text"

How do I do this?

Upvotes: 3

Views: 4823

Answers (2)

L.B
L.B

Reputation: 116188

Use HtmlAgilityPack

string html = @"This is <h4>Some</h4> Text" + Environment.NewLine +
                "This is some more <h5>text</h5>";

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var str = doc.DocumentNode.InnerText;

Upvotes: 8

mr&#243;wa
mr&#243;wa

Reputation: 5781

Simple using regex: Regex.Replace(source, "<.*?>", string.Empty);

Upvotes: 1

Related Questions