Ruwanka De Silva
Ruwanka De Silva

Reputation: 3755

Use explorer.document as source HtmlDocument for HtmlAgilityPack

I want to use currently loaded webpage in internet explorer as HtmlDocument in HtmlAgilityPack. I am using explorer document through mshtml as COM object.

mshtml.HTMLDocument doc = explorer.Document as mshtml.HTMLDocument;

Then I've tried to convert it to HtmlDocument which is using in HtmlAgilityPack

HtmlAgilityPack.HtmlDocument hdoc = (HtmlAgilityPack.HtmlDocument)doc;

But it's not working due to invalid cast operation. Exception message is shown below.

Exception Message

Anyhow I want to use currently loaded webpage as source to htmlagilitypack, I know that I can use HtmlWeb provided by htmlagility pack and load current url but I want to highlight elements which are in the loaded page (elements found using htmlagilitypack) I guess it cannot be done through that kind of implementation. Any ideas to implement this any support will be great. thanks.

Upvotes: 2

Views: 1502

Answers (1)

jessehouwing
jessehouwing

Reputation: 114857

Of course you can't cast between mshtml.HTMLDocument and HtmlAgilityPack.HtmlDocument, they're completely distinct classes from different libraries, where one is purely managed and the other is a managed COM wrapper.

What you can do is grab the HTML from the mshtml.HTMLDocument and load it into the Agility Pack.

Probably something along these lines:

  mshtml.IHTMLDocument3 sourceDoc = (mshtml.IHTMLDocument3) explorer.Document;  
  string documentContents = sourceDoc.documentElement.outerHTML; 

  HtmlAgilityPack.HtmlDocument targetDoc = new HtmlAgilityPack.HtmlDocument();

  targetDoc.LoadHtml(documentContents);

You could also use the IPersistStream and then call the Save method, pass a MemoryStream and then feed that to the HtmlAgilityPack.

Upvotes: 4

Related Questions