Reputation: 1749
I'm trying to scrape this web page ...
http://prontosoccorso.usl4.toscana.it/attesa/home.asp
using PHP and XPath to get the number values under the red, yellow, green and white colored circles.
(NOTE: you could see different value in that page if you try to browse it ... it doesn't matter ..,, it change dinamically .... )
I'm trying to use this PHP code sample to print the value ...
<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);
$url = 'http://prontosoccorso.usl4.toscana.it/attesa/home.asp';
$xpath_for_parsing = '[@id="prontosoccorso"]/tbody/tr[2]/td[2]';
//#Set CURL parameters: pay attention to the PROXY config !!!!
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '');
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
@$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$colorWaitingNumber = $xpath->query($xpath_for_parsing);
$theValue = 'N.D.';
foreach( $colorWaitingNumber as $node )
{
$theValue = $node->nodeValue;
}
print $theValue;
?>
The code works fine but the result is always 0 !!
I've notice that if you use
$xpath_for_parsing = '[@id="prontosoccorso"]';
the result is
Situazione aggiornata al giorno 30/12/2017 alle ore 14:09 Rosso Giallo Verde Azzurro Bianco Pazienti in attesa (totale 0) 0 0 0 0 0 Pazienti in visita (totale 0) 0 0 0 0 0 Pazienti trattati nelle ultime ore 0 0 0 0 0
so the result 0 for my values is coherent (and also if you try the following curl http://prontosoccorso.usl4.toscana.it/attesa/home.asp
from command line you note that the values are all zero .... )
Analyzing with browser console I can't found the request that get tha real values ..... Any help / suggestions?
Thank you in advance .. .
Upvotes: 0
Views: 110
Reputation: 57131
One thing to notice is that even if you go to that web page, you start off with 0's in all the fields, which is why I tried with loading the page twice. This still didn't work, so I then made it store the cookies between calls and the values start to turn up.
The code is mainly what you have, there are extra curl_setopt()
calls to create a cookie file (may be able to do this once and that will always work - don't quote me on that).
The XPath, will only fetch the first row of fields, but this can be easily adapted for the other rows.
<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);
$url = 'http://prontosoccorso.usl4.toscana.it/attesa/home.asp';
//#Set CURL parameters: pay attention to the PROXY config !!!!
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '');
$cookies = "./cookie.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);
$data = curl_exec($ch);
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$xpath_for_parsing = '//table[@id="prontosoccorso"]/tbody/tr[2]/td';
$colorWaitingNumber = $xpath->query($xpath_for_parsing);
$theValue = 'N.D.';
foreach( $colorWaitingNumber as $node )
{
echo $theValue = $node->nodeValue.PHP_EOL;
}
You may be able to add some logic that checks if all values are 0 to reload the page. But this code just calls curl_exec()
twice.
Upvotes: 1