ulrichb
ulrichb

Reputation: 20054

How to extract XML from the WebBrowser control?

I want the same as WebBrowser.Document.Body.InnerHtml, but as an XML representation.

Upvotes: 2

Views: 2869

Answers (3)

tjmoore
tjmoore

Reputation: 1084

Are you using WebBrowser to browse an XML document and want to get to that XML in code, or are you trying to browse to an HTML page and represent HTML as XML?

If the former you can likely just get the raw text from the WebBrowser (maybe InnerText instead of InnerHTML) and parse it as XML.

If the latter, the problem is, HTML isn't XML (unless it's XHTML).

You can convert it to XML with 'tidy' tools but the representation accuracy depends on how well formed the orginal HTML is.

Upvotes: 3

Sheng Jiang 蒋晟
Sheng Jiang 蒋晟

Reputation: 15271

IE's document has an expando property named "XMLDocument". You can access it via its IDispatchEx interface.

You can get the document's COM interface via Document.DomDocument.

Upvotes: 0

Winston Smith
Winston Smith

Reputation: 21902

TidyCOM will clean up HTML to XHTML.

Here's how to use it from C#.

Upvotes: 0

Related Questions