Reputation: 3402
DO you know a library for Web page scraping for Delphi. Like Beautiful Soup or Scrapy for Python ?
Upvotes: 7
Views: 7017
Reputation: 214
After the page is loaded with TWebBrowser component, query the TWebBrowser.Document property for the IHTMLDocument2 interface and then you can enumerate the elements.
You can getElementsById, getElementsByTagName, getElementsByName, for example:
var
Elem: IHTMLElement;
begin
Elem := GetElementById(WebBrowser1.Document, 'myid') as IHTMLElement;
end;
or get all HTML text and use any way you want, for example:
sourceHTML := WebBrowser.Document as IHTMLDocument2;
sourceHTML.body.innerHTML;
Upvotes: 1
Reputation: 16937
Well, it's not for Delphi, but for FreePascal, since I do not have a recent Delphi version, but porting between them is supposed to be not so difficult.
Anyways, my Internet Tools are probably the best Pascal web scraping library that are out there.
You can, e.g. print all links on a page with:
uses simpleinternet, xquery;
var a: IXQValue;
begin
for a in process('http://stackoverflow.com', '//a/@href') do
writeln(a.toString);
end.
They are platform independent; have full support for XPath 2, XQuery, CSS 3 selectors (those are not so well tested through, XPath is better anyways) and pattern-matching; parse xml and html; and download over http and https.
Upvotes: 12