kseen
kseen

Reputation: 397

Built-in way to parse any raw HTML

I start writing an app that should retrieve meta tags content from any specified HTML page. As I use .NET 2.0 for this purpose I can't use LINQ to XML or something modern. So, I tried to use XmlDocument class. Unfortunately, it can't work with invalid XML documents, which are most HTML are.

I even can't use HtmlAgilityPack because I'm writing app that I plan to sell in future, so it probably doesn't fit commercial needs.

Working with XmlReader seems too hard.

So, how would you guys manage this issue?


POST EDIT

Another one reason why I better avoid using HtmlAgilityPack is that it is so huge lib to adding to my project. I will be more happy keeping project as small as possible.

Do you guys really advice me use HtmlAgilityPack any way?

Upvotes: 0

Views: 923

Answers (2)

abatishchev
abatishchev

Reputation: 100248

HtmlDocument doc = new System.Windows.Forms.WebBrowser().Document.OpenNew(true);
doc.Write("<HTML><BODY>This is a new HTML document.</BODY></HTML>");

See MSDN.

Note that this is a WebForms control, you may face different issues running it out of WebForms app.

Upvotes: 0

BrokenGlass
BrokenGlass

Reputation: 160852

I even can't use HtmlAgilityPack because I'm writing app that I plan to sell in future, so it probably doesn't fit commercial needs.

HtmlAgilityPack is using a Microsoft Public License (Ms-PL), which will allow you to use it in a commercial product, it's very liberal - also see "How does MS-PL license work?" and Microsoft Public License (Ms-PL)

Upvotes: 5

Related Questions