Maio Steger
Maio Steger

Reputation: 35

Scraping images from Website in delphi with twebbrowser

im trying to make a small tool which downloads all images from the site visited. It have to be made with twebbrowser component. The test site from my customer is Click. At the moment im selecting the pictures with getelementbyid but some of the pictures dont have a id. How can i adress the missing ones? Thanks alot

Upvotes: 0

Views: 1618

Answers (2)

Remy Lebeau
Remy Lebeau

Reputation: 598279

After the page is loaded, query the TWebBrowser.Document property for the IHTMLDocument2 interface, and then you can enumerate the elements of the IHTMLDocument2.images collection:

var
  Document: IHTMLDocument2;
  Images: IHTMLElementCollection;
  Image: IHTMLImgElement;
  I: Integer;
begin
  Document := WebBrowser1.Document as IHTMLDocument2;
  Images := Document.images;
  For I := 0 to Images.length - 1 do
  begin
    Image := Images.item(I, '') as IHTMLImgElement;
    // use Image as needed...
  end;
end;

Note that this will only find images in HTML <img> tags. If you need to find images in <input type="image"> tags as well, you will have to enumerate the elements of the IHTMLDocument2.all collection looking for instances of the IHTMLInputElement interface whose type property is "image", eg:

var
  Document: IHTMLDocument2;
  Elements: IHTMLElementCollection;
  Element: IHTMLElement;
  Image: IHTMLImgElement;
  Input: IHTMLInputElement;
  I: Integer;
begin
  Document := WebBrowser1.Document as IHTMLDocument2;
  Elements := Document.all;
  For I := 0 to Elements.length - 1 do
  begin
    Element := Elements.item(I, '') as IHTMLElement;
    if Element is IHTMLImgElement then begin
      Image := Element as IHTMLImgElement;
      // use Image as needed...
    end
    else if Element is IHTMLInputElement then begin
      Input := Element as IHTMLInputElement;
      if Input.type = 'image' then
      begin
        // use Input as needed...
      end;
    end;
  end;
end;

Upvotes: 3

Dan Barclay
Dan Barclay

Reputation: 39

Instead of requesting a specific element by id, you can "walk" the document and look at each element using WebDocument.all.item(itemnum,'').

var
  cAllElements: IHTMLElementCollection;
  eThisElement: IHTMLElement;
  WebDocument: IHTMLDocument2;

=======

 cAllElements:=WebDocument.All
  For iThisElement:=0 to cAllElements.num-1 do
    begin
      eThisElement:=cAllElements.item(iThisElement,'') as IHTMLElement;
      // check out eThisElement and do what you want
    end;

You would then look at the element .tagName for IMG, or do whatever assessment you need in order to determine if it is a picture and handle it as you did before.

Dan

Upvotes: 0

Related Questions