Cesare
Cesare

Reputation: 1749

Right Xpath for HTML elements?

I need to scrape this HTML page ...

http://www1.usl3.toscana.it/default.asp?page=ps&ospedale=3

enter image description here

.... using PHP and XPath to get the value 7 near the string "CODICE GIALLO"

(NOTE: you could see different value in that page if you try to browse it ... it doesn't matter ..,, it change dinamically .... )

I'm using this PHP code sample to print the value ...

<?php
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);

    $url = 'http://www1.usl3.toscana.it/default.asp?page=ps&ospedale=3';

    $xpath_for_parsing = '/html/body/div/div[2]/table[2]/tbody/tr[1]/td/table/tbody/tr[3]/td[2]/table/tbody/tr[4]/td[2]/table/tbody/tr[2]/td[2]/b';

    //#Set CURL parameters: pay attention to the PROXY config !!!!
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');
    $data = curl_exec($ch);
    curl_close($ch);

    $dom = new DOMDocument();
    @$dom->loadHTML($data);

    $xpath = new DOMXPath($dom);

    $colorWaitingNumber = $xpath->query($xpath_for_parsing);
    $theValue =  'N.D.';
    foreach( $colorWaitingNumber as $node )
    {
      $theValue = $node->nodeValue;
    }

    print $theValue;
?>

In this way I obtain "N.D." as output not "7" as I suppose.

Reading this Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing? I've seen that the problem coud be about the <tbody> tag so I've tried to eliminate it form my original xpath and I tried my code using:

$xpath_for_parsing = '/html/body/div/div[2]/table[2]/tr[1]/td/table/tr[3]/td[2]/table/tr[4]/td[2]/table/tr[2]/td[2]/b'

but the result is still "N.D." instead of "7".

Using

$xpath_for_parsing = '/html/body/div/div[2]/table[2]/tr[1]/td/table/tr[3]/td[2]/table/tr[4]/td[2]/table'

the result is "Codice GIALLO 7"

How may I obtain only the "7" value?

Any suggestions / example?

Upvotes: 0

Views: 159

Answers (1)

Andersson
Andersson

Reputation: 52665

This one should do the trick:

//td[.="Codice GIALLO"]/following-sibling::td/b

Upvotes: 1

Related Questions