Reputation: 33
I'm a bit new at curl and xpath so still learning the in's and out's. I have written a scraper but when i try to show the scraped data via an array, nothing shows up. So what is wrong with my code?
<?php
ini_set("display_errors", "1");
error_reporting(-1);
error_reporting(E_ERROR);
libxml_use_internal_errors(true);
//Basic Function
function get_url_contents($url, $timeout = 10, $userAgent = 'Mozilla/5.0(Macintosh; U; Intel Mac OS X 10_5_8; en-US)AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.215 Safari/534.10'){
$rawhtml = curl_init();//handler
curl_setopt($rawhtml, CURLOPT_URL,$url);//url
curl_setopt($rawhtml, CURLOPT_RETURNTRANSFER, 1);//return result as string rahter than direct output
curl_setopt($rawhtml, CURLOPT_CONNECTTIMEOUT,$timeout);//set timeout
curl_setopt($rawhtml, CURLOPT_USERAGENT,$userAgent);//set user agent
$output = curl_exec($rawhtml);//execute curl call
curl_close($rawhtml);//close connection
if(!$output){
return -1;//if nothing obtained, return -1
}
return $output;
}
//get raw html
$html_string = get_url_contents("http://www.beursgorilla.nl/fonds-informatie.asp?naam=Aegon&cat=koersen&subcat=1&instrumentcode=955000020");//url here
//load HTML into DOM object
//ref http://www.php.net/manual/en/domdocument.loadhtml.php
//note html does not have to be well fpr,ed with this function
$dom_object = new DOMDocument();
@$dom_object->loadHTML($html_string);
//perform Xpath queries on DOM
//ref http://www.php.net/manual/en/domxpath.query.php
$xpath = new DOMXPath($dom_object);
//perform Xpath query
//use any specfic property to narrow focus
$nodes = $xpath->query("//table[@class='maintable']/tbody/tr[4]/td[2]/table[@class='koersen_tabel']/tbody/tr[2]/td[@class='koersen_tabel_midden']");
//setup some basic variables
$i = -1; //$i = counter
//when process nodes as below, cycling trough
//but not grabbing data from the header row of the table
$result = array();
//preform xpath subqueries to get numbers
foreach($nodes as $node){
$i++;
//using each 'node' as the limit for the new xpath to search within
//make queries relative by starting them with a dot (e.g. ".//...")
$details = $xpath->query("//table[3]/tbody/tr/td[1]/table[@class='fonds_info_koersen_links']/tbody/tr[1]/td[2]", $node);
foreach($details as $detail){
$result[$i][''] = $detail->nodeValue;
}
$details = $xpath->query("//table[3]/tbody/tr/td[1]/table[@class='fonds_info_koersen_links']/tbody/tr[4]/td[2]", $node);
foreach($details as $detail){
$result[$i][''] = $detail->nodeValue;
}
if(curl_errno($rawhtml)){
echo 'Curl error: ' . curl_error($rawhtml);
print'<pre>';
print_r($result);
print '</pre>';
}
}
?>
I have checked the xpath query's via Chrome's element inspector and they seem to be correct. I really don't know what is wrong with the code.
Upvotes: 1
Views: 300
Reputation: 33
I have rewritten my crawler and used PHP Simple HTML DOM Parser. This fixed my problem, everything works now :).
Upvotes: 0
Reputation: 3566
What about this line of code?
$result[$i][''] = $detail->nodeValue;
Shouldn't this look like:
$result[$i][] = $detail->nodeValue;
(look at square braces)
Upvotes: 1