Aptivus
Aptivus

Reputation: 433

regex php find data in a table

I am trying to get the total yearly value of solar irradiation and other values from a table I get with curl from European pv_gis.

The table I get is:

<table class=data_table border="1" width="300" >
<tr> <td> Jan </td><td align="right">2.27</td><td align="right">70.3</td><td align="right">2.86</td><td align="right">88.5</td></tr>
<tr> <td> Feb </td><td align="right">2.79</td><td align="right">78.0</td><td align="right">3.56</td><td align="right">99.7</td></tr>
<tr> <td> Mar </td><td align="right">3.59</td><td align="right">111</td><td align="right">4.74</td><td align="right">147</td></tr>
<tr> <td> Apr </td><td align="right">4.23</td><td align="right">127</td><td align="right">5.68</td><td align="right">171</td></tr>
<tr> <td> May </td><td align="right">4.46</td><td align="right">138</td><td align="right">6.13</td><td align="right">190</td></tr>
<tr> <td> Jun </td><td align="right">4.53</td><td align="right">136</td><td align="right">6.38</td><td align="right">191</td></tr>
<tr> <td> Jul </td><td align="right">4.74</td><td align="right">147</td><td align="right">6.70</td><td align="right">208</td></tr>
<tr> <td> Aug </td><td align="right">4.59</td><td align="right">142</td><td align="right">6.53</td><td align="right">202</td></tr>
<tr> <td> Sep </td><td align="right">4.32</td><td align="right">130</td><td align="right">5.96</td><td align="right">179</td></tr>
<tr> <td> Oct </td><td align="right">3.63</td><td align="right">113</td><td align="right">4.87</td><td align="right">151</td></tr>
<tr> <td> Nov </td><td align="right">2.64</td><td align="right">79.1</td><td align="right">3.41</td><td align="right">102</td></tr>
<tr> <td> Dec </td><td align="right">2.15</td><td align="right">66.5</td><td align="right">2.72</td><td align="right">84.3</td></tr>
<tr><td colspan=5> </td></tr>
<tr><td><b> Yearly average </b></td><td align="right"><b>3.67 </b></td><td align="right"><b>111 </b></td></td><td align="right"><b>4.97 </b></td><td align="right"><b>151 </b></td></tr>
<tr><td><b>Total for year</b></td><td align="right" colspan=2 ><b>  1340 </b> </td> <td align="right" colspan=2 ><b>  1810 </b> </td> </tr>
</table>

As you can see, the Total values are contained in the last tag of that table. Specifically, the total yearly value is in the second tag.

Now, I have tried to use txt2reg tools to build a regular expression, but with success, as I don't know how to target the last row of the above mentioned table.

I get infinite string of numbers, by deleting all TR and TD, but at that point, numbers get confused.

Do you guys have some suggestions?

Thank you very much.

EDIT

I did the following, but I get an error. The error is:

Catchable fatal error: Argument 1 passed to DOMXPath::__construct() must be an instance of DOMDocument, instance of DOMElement given in C:\Users\test\www2\test_pvgis.php on line 49

And the code is:

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($varResponse);

$table = $doc->getElementsByTagName('table')->item(1); 

print_r($table);


$xpath = new DOMXpath($table);

$lastRow = $xpath->query("(//tr)[last()]");

// look for td elements inside the last row we isolated above
// path for td elements is relative
$cells = $xpath->query('./td',$lastRow[0]);

// you can also store the values for later use
foreach($cells as $key=>$cell){
    //we are ignoring the first key, since it holds the "Total for year" bit

    if ($key != 0){
        $store[] = trim($cell->nodeValue); // trim out the leading and trailing spaces
    }
}
print_r($store);

The error is located here: $xpath = new DOMXpath($table); but I have to idea why. Any clue?

Upvotes: 1

Views: 115

Answers (1)

Alex Andrei
Alex Andrei

Reputation: 7283

Edit

Assuming you have more tables and the first one is the relevant one.
You need to pass a DOMDocument instance to the DOMXpath constructor.
So you will use the $doc for $xpath = new DOMXpath($doc);
And when you query for the last row, you pass as second parameter the $table element


Here's an example using DOMDocument and DOMXpath

// start edit
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($varResponse);

$table = $doc->getElementsByTagName('table')->item(1); 

print_r($table);

$xpath = new DOMXpath($doc);

$lastRow = $xpath->query("(./tr)[last()]",$table);
// end edit

// look for td elements inside the last row we isolated above
// path for td elements is relative
$cells = $xpath->query('./td',$lastRow->item(0)); // fixed 'Cannot use object of type DOMNodeList as array i'

// you can also store the values for later use
foreach($cells as $key=>$cell){
    //we are ignoring the first key, since it holds the "Total for year" bit

    if ($key != 0){
        $store[] = trim($cell->nodeValue); // trim out the leading and trailing spaces
    }
}
print_r($store);
/*
ouputs
Array
(
    [0] => 1340
    [1] => 1810
)
*/

Upvotes: 2

Related Questions