Reputation: 11
I often use XPath with php for parsing pages, but this time i don't understand the behavior with this specific page with the following code, I hope you can help me on this.
Code that I use to parse this page http://www.jeuxvideo.com/recherche.php?m=9&t=10&q=Call+of+duty :
<?php
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
$response = curl_exec($ch);
curl_close($ch);
/*
$search = array("<article", "</article>");
$replace = array("<div", "</div>");
$response = str_replace($search, $replace, $response);
*/
$dom = new DOMDocument();
@$dom->loadHTML($response);
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//article[@class="recherche-aphabetique-item"]/a');
//$elements = $xpath->query('//div[@class="recherche-aphabetique-item"]/a');
count($elements);
var_dump($elements);
?>
fiddle to test it : http://phpfiddle.org/main/code/r9n6-d0j0
I just want to get all "a" nodes that are in "article" nodes with the class "recherche-aphabetique-item".
But it returns me nothing :/.
As you can see in the commented code I've tried to replace html5 elements articles to div, but I got the same behavior.
Thanks four your help.
Upvotes: 1
Views: 695
Reputation: 33813
I'm seeing lots of DOMDocument::loadHTML(): Unexpected end tag
errors - you should use the internal error handling functions of libxml to help fix this perhaps. Also, when I looked at the DOM of the remote site I could not see any a
tags that would match the XPath query, only span
tags
<?php
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
$response = curl_exec($ch);
curl_close($ch);
/* try to suppress errors using libxml */
libxml_use_internal_errors( true );
$dom = new DOMDocument();
/* additional flags for DOMDocument */
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;
@$dom->loadHTML($response);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//article[@class="recherche-aphabetique-item"]/span');
count( $elements );
var_dump( $elements );
?>
object(DOMNodeList)#97 (1) { ["length"]=> int(94) }
You could further simplify this perhaps by trying:
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;
libxml_use_internal_errors( true );
$dom = new DOMDocument();
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;
@$dom->loadHTMLFile($Query);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//article[@class="recherche-aphabetique-item"]/span');
count($elements);
foreach( $elements as $node )echo $node->nodeValue,'<br />';
Upvotes: 1