Reputation: 29
I was using this question about this matter. How to parse this table and extract data from it?
But got stumped on the table I am trying to parse.
This is the PHP page source code. There is only one table in it, table id "troops".
I managed to get the table headers on an array, but can't connect the row data with the headers.
This is the code I am using, its for the article above, edited to my needs.
html source-code http://pastebin.com/RKbzVT1V
php code used
$content = $_POST['src'];
$dom = new DomDocument;
$dom -> loadHtml($content);
$xpath = new DomXPath($dom);
// collect header names
$headerNames = array();
foreach ($xpath->query('//table[@id="troops"]//th') as $node) {
//foreach ($xpath->query('//th[ contains (@class, "vil fc") ]') as $node) {
$headerNames[] = $node -> nodeValue;
}
// collect data
$data = array();
foreach ($xpath->query('//tr') as $node) {
$rowData = array();
foreach ($xpath->query('//td', $node) as $cell) {
$rowData[] = $cell -> nodeValue;
}
$data[] = array_combine($headerNames, $rowData);
}
Any help on this matter is appreciated, if there is an easier way please advise.
Upvotes: 2
Views: 420
Reputation: 316969
Running your code I get:
PHP Warning:
array_combine()
: Both parameters should have an equal number of elements
This means the number of items in $headerNames
does not equal the number of items in $rowData
. Your $rowData
contains all TD Elements of a row but if you look at the HTML you will see that there is many more TD elements than TH elements:
<tr class="hover">
<th class="vil fc">
<a href="build.php?newdid=3665&id=39#td">00 La piu …</a>
</th>
<td>54</td>
<td>5</td>
<td class="none">0</td>
<td>74</td>
<td>355</td>
<td class="none">0</td>
<td class="none">0</td>
<td class="none">0</td>
<td class="none">0</td>
<td class="none">0</td>
<td class="none lc">0</td>
</tr>
I assume you are trying to achieve something like this:
[00 La piu …] => Array
(
[0] => 54
[1] => 5
[2] => 0
[3] => 74
[4] => 355
[5] => 0
[6] => 0
[7] => 0
[8] => 0
[9] => 0
[10] => 0
)
which the following code will produce:
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile('NewHTMLFile.html');
$table = $dom->getElementById('troops');
foreach ($table->getElementsByTagName('tr') as $tr) {
if ($header = $tr->getElementsByTagName('th')->item(0)) {
$data[trim($header->nodeValue)] = array_map(
function(DOMElement $td) { return $td->nodeValue; },
iterator_to_array($tr->getElementsByTagName('td'))
);
}
}
libxml_use_internal_errors(false);
print_r($data);
If this is not what you are looking for, please update your question and include a sample of the output you are trying to get.
Upvotes: 2