Reputation: 5022
Is there any free/open source c# libraries to extract data from html?
Given the input below
<div style="...">
text part 1
</div>
<div style="...">
text part 2
</div>
I want the output to be:
text part 1 text part 2
Upvotes: 1
Views: 5986
Reputation: 14133
Yes, you can use HtmlAgilityPack to parse HTML using Xpath queries as if it were XML.
Upvotes: 6
Reputation: 17477
you can use HtmlAgilitiPack very good library.
and then:
public string StripHTMLTags(string str)
{
StringBuilder pureText = new StringBuilder();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(str);
foreach (HtmlNode node in doc.DocumentNode.ChildNodes)
{
pureText.Append(node.InnerText);
}
return pureText.ToString();
}
Upvotes: 4