Cesare
Cesare

Reputation: 1749

Parsing values from a ASP web page using PHP and XPath

I'm trying to scrape this web page ...

http://prontosoccorso.usl4.toscana.it/attesa/home.asp

enter image description here

using PHP and XPath to get the number values under the red, yellow, green and white colored circles.

(NOTE: you could see different value in that page if you try to browse it ... it doesn't matter ..,, it change dinamically .... )

I'm trying to use this PHP code sample to print the value ...

<?php
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);

    $url = 'http://prontosoccorso.usl4.toscana.it/attesa/home.asp';

    $xpath_for_parsing = '[@id="prontosoccorso"]/tbody/tr[2]/td[2]';

    //#Set CURL parameters: pay attention to the PROXY config !!!!
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');

    $data = curl_exec($ch);
    curl_close($ch);

    $dom = new DOMDocument();
    @$dom->loadHTML($data);

    $xpath = new DOMXPath($dom);

    $colorWaitingNumber = $xpath->query($xpath_for_parsing);
    $theValue =  'N.D.';
    foreach( $colorWaitingNumber as $node )
    {
      $theValue = $node->nodeValue;
    }

    print $theValue;
?>

The code works fine but the result is always 0 !!

I've notice that if you use

    $xpath_for_parsing = '[@id="prontosoccorso"]';

the result is

Situazione aggiornata al giorno 30/12/2017 alle ore 14:09 Rosso Giallo Verde Azzurro Bianco Pazienti in attesa (totale 0) 0 0 0 0 0 Pazienti in visita (totale 0) 0 0 0 0 0 Pazienti trattati nelle ultime ore 0 0 0 0 0

so the result 0 for my values is coherent (and also if you try the following curl http://prontosoccorso.usl4.toscana.it/attesa/home.aspfrom command line you note that the values are all zero .... )

Analyzing with browser console I can't found the request that get tha real values ..... Any help / suggestions?

Thank you in advance .. .

Upvotes: 0

Views: 110

Answers (1)

Nigel Ren
Nigel Ren

Reputation: 57131

One thing to notice is that even if you go to that web page, you start off with 0's in all the fields, which is why I tried with loading the page twice. This still didn't work, so I then made it store the cookies between calls and the values start to turn up.

The code is mainly what you have, there are extra curl_setopt() calls to create a cookie file (may be able to do this once and that will always work - don't quote me on that).

The XPath, will only fetch the first row of fields, but this can be easily adapted for the other rows.

<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);

$url = 'http://prontosoccorso.usl4.toscana.it/attesa/home.asp';

//#Set CURL parameters: pay attention to the PROXY config !!!!
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '');
$cookies = "./cookie.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);

$data = curl_exec($ch);
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
$dom->loadHTML($data);

$xpath = new DOMXPath($dom);
$xpath_for_parsing = '//table[@id="prontosoccorso"]/tbody/tr[2]/td';

$colorWaitingNumber = $xpath->query($xpath_for_parsing);

$theValue =  'N.D.';
foreach( $colorWaitingNumber as $node )
{
    echo $theValue = $node->nodeValue.PHP_EOL;
}

You may be able to add some logic that checks if all values are 0 to reload the page. But this code just calls curl_exec() twice.

Upvotes: 1

Related Questions