Reputation: 59
Im trying to code a "robot" that crawl a forum to make stats.
Here is my code : https://pastebin.com/6zAaQ0fF
<?php
$ch = curl_init();
$timeout = 0; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://m.jeuxvideo.com/forums/42-51-61922886-1-0-1-0-once-upon-time-in-hollywood.htm');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$file_contents = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($file_contents);
$xpath = new DOMXPath($dom);
$posts = $xpath->query("//div[@class='who-post']");//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
$dates = $xpath->query("//div[@class='date-post']");//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
$contenus = $xpath->query("//div[@class='contenu']");//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
foreach ($posts as $post) {
$nodes = $post->childNodes;
foreach ($nodes as $node) {
$value = trim($node->nodeValue);
echo $node->nodeValue;
$tab[] = $node->nodeValue;
}
}
foreach ($dates as $date) {
$nodes = $date->childNodes;
foreach ($nodes as $node) {
echo trim($node->nodeValue);
}
}
?>
<pre>
<?php
print_r($tab);
?>
</pre>
I dont undederstand why I receive some blank spaces in my array while its correctly works when using echo function...
Thank you for your help !
Helpp
Upvotes: 0
Views: 159
Reputation: 19780
You could get the <a>
tag of posts.
$posts = $xpath->query("//div[@class='who-post']/a");
Also, you don't use the trimmed value (in the first loop) :
$value = trim($node->nodeValue);
$tab[] = $node->nodeValue;
Change to:
$value = trim($node->nodeValue);
$tab[] = $value;
Output:
Array
(
[0] => Thewiitcheur
[1] => Thewiitcheur
[2] => Shaq24
[3] => Downy-down
[4] => LosyCITY
[5] => DanaAndrews
[6] => Racouske
[7] => Gnagngan
[8] => harvey-specter
[9] => frivyhotasmr
[10] => Jowst
[11] => Thewiitcheur
[12] => ChibreCarnivore
[13] => pseudobanni5678
[14] => Chimpanzee
[15] => EncoreBan25
[16] => spagetthivolant
[17] => Chimpanzee
[18] => JeromeGerber
[19] => chopsueys
)
Upvotes: 1