philnext
philnext

Reputation: 3402

Web page scraping in Delphi

DO you know a library for Web page scraping for Delphi. Like Beautiful Soup or Scrapy for Python ?

Upvotes: 7

Views: 7017

Answers (2)

Leonardo Gregianin
Leonardo Gregianin

Reputation: 214

After the page is loaded with TWebBrowser component, query the TWebBrowser.Document property for the IHTMLDocument2 interface and then you can enumerate the elements.

You can getElementsById, getElementsByTagName, getElementsByName, for example:

var
  Elem: IHTMLElement;
begin
   Elem := GetElementById(WebBrowser1.Document, 'myid') as IHTMLElement;
end;

or get all HTML text and use any way you want, for example:

sourceHTML := WebBrowser.Document as IHTMLDocument2;
sourceHTML.body.innerHTML;

Upvotes: 1

BeniBela
BeniBela

Reputation: 16937

Well, it's not for Delphi, but for FreePascal, since I do not have a recent Delphi version, but porting between them is supposed to be not so difficult.

Anyways, my Internet Tools are probably the best Pascal web scraping library that are out there.

You can, e.g. print all links on a page with:

uses simpleinternet, xquery;

var a: IXQValue;
begin
  for a in process('http://stackoverflow.com', '//a/@href') do
    writeln(a.toString);
end.

They are platform independent; have full support for XPath 2, XQuery, CSS 3 selectors (those are not so well tested through, XPath is better anyways) and pattern-matching; parse xml and html; and download over http and https.

Upvotes: 12

Related Questions