Reputation: 95
I have the following function that gets the HTML document's source code after DocumentComplete event.
function TBrowser.GetWebBrowserHTML(const WebBrowser: TWebBrowser): string;
var
LStream: TStringStream;
Stream : IStream;
LPersistStreamInit : IPersistStreamInit;
begin
try
if not Assigned(WebBrowser.Document) then exit;
LStream := TStringStream.Create('', TEncoding.UTF8);
try
LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;
Stream := TStreamAdapter.Create(LStream,soReference);
LPersistStreamInit.Save(Stream,true);
result := LStream.DataString;
finally
LStream.Free();
end;
except
end;
end;
The problem: the source code is retrieved before AJAX calls are performed on the page. The page finishes loading (as WebBrowser determines), but AJAX continues to modify the DOM and additional elements appear on the page. What I need is the equivalent of Mozilla's "View Generated Source", or the html source that appear when inspecting the web page with Firebug or Chrome Inspector or IE Developer Tools.
Seems that in C there is DocumentText property that does this thing, but couldn't find any property or methods to achieve this in Delphi.
Any ideas/hints/help please?
Upvotes: 1
Views: 4503
Reputation: 116170
You can use the IHTMLDocument2
interface, which is the interface that TWebBrowser.Document implements. The property is exposed as an IDispatch, but you can cast it to the interface, or to an (Ole)Variant, although you won't benefit from code completion then.
The IHTMLDocument2 interface supports the DocumentElement property, which points to the root element of the document. That element (as any other) has the property outerHTML
, which gives you the element and all its contents as a string:
var
d: OleVariant;
begin
d := WebBrowser1.Document;
ShowMessage(d.documentElement.outerHTML);
As far as I can see, this is the actual state of the document, including any changes that are made by Javascript.
It doesn't seem to include the doctype, but then again, if I find the doctype element through Webbrowser1.Document.All, then its outerHTML property doesn't return anything. Other parts of the document are also changed (tag names in capitals, for one), but that only confirms that this is a generated document structure based on the loaded DOM, rather than the original source of the document.
Upvotes: 2