Reputation: 11
I'm having trouble getting all element ids of a webpage loaded in Twebbrowser control. Could someone help me to achieve this? I want to save the element ids in a listbox or similar but it would be enough to know how to get all of them.
Thanks a lot!
Upvotes: 1
Views: 2616
Reputation: 30735
The following will scan a TWebBrowser's contents and put the ID attributes of its nodes into a TMemo.
// Note : You need to add MSHTML to your Uses list if it's not there already
procedure TForm1.GetIDs;
var
All : IHTMLElementCollection;
Doc : IHtmlDocument2;
E : IHtmlElement;
i : Integer;
S : String;
begin
Doc := IDispatch(WebBrowser1.Document) as IHtmlDocument2;
Assert(Doc <> Nil);
All := IDispatch(Doc.all) as IHTMLElementCollection;
for i := 0 to All.Length - 1 do begin
E := IDispatch(All.Item(i, 0)) as IHtmlElement;
S :=IntToStr(i) + ' ' + E.id;
Memo2.Lines.Add(S);
end;
end;
This uses the interfaces of the DOM objects defined in MSHTML.Pas. There is a lot to MS's DOM object and you need to immerse yourself in it for a while to get used to it. See here for a way in:
https://msdn.microsoft.com/en-us/library/aa703928%28v=vs.85%29.aspx#properties
As you can see even from my (over)simple example, using it tends to require quite a lot of hopping around between raw interfaces and ones wrapped in OleVariants, as in
All := IDispatch(Doc.all) as IHTMLElementCollection;
Btw, the above code uses "early binding" to the interface objects defined in MSHTML.Pas. A lot of the example code you will see for working with these objects uses OleVariants (i.e. "late binding" - see the OLH if you're unsure about the difference between early and late binding). The following is a late-bound version of the code above.
procedure TForm1.GetIDs2;
var
All,
Doc,
E : OleVariant;
i : Integer;
S : String;
begin
Doc := WebBrowser1.Document;
All := Doc.all;
for i := 0 to All.Length - 1 do begin
E := All.Item(i);
S :=IntToStr(i) + ' ' + E.id;
Memo2.Lines.Add(S);
end;
end;
Generally, late-binding is easier to use to experiment and get things working initially, because it allows for optional parameters to be omitted and doesn't require you to use typed interface variables. The down-sides are that the IDE doesn't do code completion on late-bound interfaces and execution is slower.
Sample HTML:
<html>
<body>
<div ID="adiv" style="TEXT-ALIGN: left; color: Gray">Some text
<div ID="asubdiv" style="TEXT-ALIGN: left; color: Gray">Subdiv</div>
</div>
<div style="TEXT-ALIGN: left; color: Gray">Some more text</div>
<div ID="cdiv" style="TEXT-ALIGN: left; color: Gray">Some even more text</div>
</body>
</html>
Upvotes: 2