Nuno Reis
Nuno Reis

Reputation: 29

Need advice on parsing an html table with PHP

I was using this question about this matter. How to parse this table and extract data from it?

But got stumped on the table I am trying to parse.

This is the PHP page source code. There is only one table in it, table id "troops".

I managed to get the table headers on an array, but can't connect the row data with the headers.

This is the code I am using, its for the article above, edited to my needs.

html source-code http://pastebin.com/RKbzVT1V

php code used

$content = $_POST['src'];
$dom = new DomDocument;
$dom -> loadHtml($content);

$xpath = new DomXPath($dom);

// collect header names

$headerNames = array();
foreach ($xpath->query('//table[@id="troops"]//th') as $node) {
//foreach ($xpath->query('//th[ contains (@class, "vil fc") ]') as $node) {
    $headerNames[] = $node -> nodeValue;

}

// collect data

$data = array();
foreach ($xpath->query('//tr') as $node) {
    $rowData = array();
    foreach ($xpath->query('//td', $node) as $cell) {
        $rowData[] = $cell -> nodeValue;
    }

    $data[] = array_combine($headerNames, $rowData);
}

Any help on this matter is appreciated, if there is an easier way please advise.

Upvotes: 2

Views: 420

Answers (1)

Gordon
Gordon

Reputation: 316969

Running your code I get:

PHP Warning: array_combine(): Both parameters should have an equal number of elements

This means the number of items in $headerNames does not equal the number of items in $rowData. Your $rowData contains all TD Elements of a row but if you look at the HTML you will see that there is many more TD elements than TH elements:

<tr class="hover">
 <th class="vil fc">
     <a href="build.php?newdid=3665&id=39#td">00 La piu …</a>
 </th>
 <td>54</td>
 <td>5</td>
 <td class="none">0</td>
 <td>74</td>
 <td>355</td>
 <td class="none">0</td>
 <td class="none">0</td>
 <td class="none">0</td>
 <td class="none">0</td>
 <td class="none">0</td>
 <td class="none lc">0</td>
</tr>

I assume you are trying to achieve something like this:

[00 La piu …] => Array
    (
        [0] => 54
        [1] => 5
        [2] => 0
        [3] => 74
        [4] => 355
        [5] => 0
        [6] => 0
        [7] => 0
        [8] => 0
        [9] => 0
        [10] => 0
    )

which the following code will produce:

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile('NewHTMLFile.html');
$table = $dom->getElementById('troops');
foreach ($table->getElementsByTagName('tr') as $tr) {
    if ($header = $tr->getElementsByTagName('th')->item(0)) {
        $data[trim($header->nodeValue)] = array_map(
            function(DOMElement $td) { return $td->nodeValue; },
            iterator_to_array($tr->getElementsByTagName('td'))
        );
    }
}
libxml_use_internal_errors(false); 
print_r($data);

If this is not what you are looking for, please update your question and include a sample of the output you are trying to get.

Upvotes: 2

Related Questions