user586011
user586011

Reputation: 1968

How to find urls in images

I am trying to extract urls from a large number of google search results. Getting them from the source code is proving to be quite challenging as the delimiters are not clear and not all of the urls are in the code. Is there a tool that can extract urls from a certain area of an image? If so that may be a better solution.

Any help would be much appreciated.

Upvotes: 0

Views: 422

Answers (2)

guillaumepotier
guillaumepotier

Reputation: 7448

Use this excellent lib: http://simplehtmldom.sourceforge.net/manual.htm

// Grab the source code
$html = file_get_html('http://www.google.com/');

// Find all anchors, returns a array of element objects
$ret = $html->find('a');

// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $ret->href;

EDit :

All "natural" search urls are in the #res div it seems.. With simplehtmldom find first #res, than all url inside of it. Don't remember exactly the syntax but it must be this way :

$ret = $html->find('div[id=res]')->find('a'); 

or maybe

$html->find('div[id=res] a');

Upvotes: 0

Emil Stenström
Emil Stenström

Reputation: 14106

Try using the JSON/Atom Custom Search API instead: http://code.google.com/apis/customsearch/v1/overview.html. It gives you 100 api calls per day, something you can increase to 10000 per day, if you pay.

Upvotes: 1

Related Questions