Reputation: 397

Built-in way to parse any raw HTML

I start writing an app that should retrieve meta tags content from any specified HTML page. As I use .NET 2.0 for this purpose I can't use LINQ to XML or something modern. So, I tried to use XmlDocument class. Unfortunately, it can't work with invalid XML documents, which are most HTML are.

I even can't use HtmlAgilityPack because I'm writing app that I plan to sell in future, so it probably doesn't fit commercial needs.

Working with XmlReader seems too hard.

So, how would you guys manage this issue?

POST EDIT

Another one reason why I better avoid using HtmlAgilityPack is that it is so huge lib to adding to my project. I will be more happy keeping project as small as possible.

Do you guys really advice me use HtmlAgilityPack any way?

Upvotes: 0

Answers (2)

abatishchev

Reputation: 100348

HtmlDocument doc = new System.Windows.Forms.WebBrowser().Document.OpenNew(true);
doc.Write("<HTML><BODY>This is a new HTML document.</BODY></HTML>");

See MSDN.

Note that this is a WebForms control, you may face different issues running it out of WebForms app.

Upvotes: 0

BrokenGlass

Reputation: 160992

I even can't use HtmlAgilityPack because I'm writing app that I plan to sell in future, so it probably doesn't fit commercial needs.

HtmlAgilityPack is using a Microsoft Public License (Ms-PL), which will allow you to use it in a commercial product, it's very liberal - also see "How does MS-PL license work?" and Microsoft Public License (Ms-PL)

Upvotes: 5

Built-in way to parse any raw HTML

Answers (2)

Related Questions