Bashk
Bashk

Reputation: 95

Delphi TWebBrowser get HTML source after AJAX load

I have the following function that gets the HTML document's source code after DocumentComplete event.

function TBrowser.GetWebBrowserHTML(const WebBrowser: TWebBrowser): string;
var
  LStream: TStringStream;
  Stream : IStream;
  LPersistStreamInit : IPersistStreamInit;
begin
  try
    if not Assigned(WebBrowser.Document) then exit;
    LStream := TStringStream.Create('', TEncoding.UTF8);
    try
      LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;
      Stream := TStreamAdapter.Create(LStream,soReference);
      LPersistStreamInit.Save(Stream,true);
      result := LStream.DataString;
    finally
      LStream.Free();
    end;
  except
  end;
end;

The problem: the source code is retrieved before AJAX calls are performed on the page. The page finishes loading (as WebBrowser determines), but AJAX continues to modify the DOM and additional elements appear on the page. What I need is the equivalent of Mozilla's "View Generated Source", or the html source that appear when inspecting the web page with Firebug or Chrome Inspector or IE Developer Tools.

Seems that in C there is DocumentText property that does this thing, but couldn't find any property or methods to achieve this in Delphi.

Any ideas/hints/help please?

Upvotes: 1

Views: 4503

Answers (1)

GolezTrol
GolezTrol

Reputation: 116170

You can use the IHTMLDocument2 interface, which is the interface that TWebBrowser.Document implements. The property is exposed as an IDispatch, but you can cast it to the interface, or to an (Ole)Variant, although you won't benefit from code completion then.

The IHTMLDocument2 interface supports the DocumentElement property, which points to the root element of the document. That element (as any other) has the property outerHTML, which gives you the element and all its contents as a string:

var
  d: OleVariant;
begin
  d := WebBrowser1.Document;
  ShowMessage(d.documentElement.outerHTML);

As far as I can see, this is the actual state of the document, including any changes that are made by Javascript.

It doesn't seem to include the doctype, but then again, if I find the doctype element through Webbrowser1.Document.All, then its outerHTML property doesn't return anything. Other parts of the document are also changed (tag names in capitals, for one), but that only confirms that this is a generated document structure based on the loaded DOM, rather than the original source of the document.

Upvotes: 2

Related Questions