Reputation: 79
I'm trying to get multiple href's from a table like this
<table class="table table-bordered table-hover">
<thead>
<tr>
<th class="text-center">No</th>
<th>TITLE</th>
<th>DESCRIPTION</th>
<th class="text-center"><span class="glyphicon glyphicon-download-alt"></span></th>
</tr>
</thead>
<tbody>
<tr data-key="11e44c4ebff985d08ca5313231363233">
<td class="text-center" style="width: 50px;">181</td>
<td style="width:auto; white-space: normal;"><a href="link-1.html">Link 1</a></td>
<td style="width:auto; white-space: normal;">Lorem ipsum dolor 1</td>
<td class="text-center" style="width: 50px;"><a href="link-1.pdf" title="Download" target="_blank"><img src="https://example.com/img/pdf.png" width="15" height="20" alt="myImage"></a></td>
</tr>
<tr data-key="11e44c4e4222d630bdd2313231323532">
<td class="text-center" style="width: 50px;">180</td>
<td style="width:auto; white-space: normal;"><a href="link-2.html">Link 2</a></td>
<td style="width:auto; white-space: normal;">Lorem ipsum dolor 2</td>
<td class="text-center" style="width: 50px;"><a href="link-2.pdf" title="Download" target="_blank"><img src="https://example.com/img/pdf.png" width="15" height="20" alt="myImage"></a></td>
</tr>
</tbody>
</table>
i try PHP DOM like this
<?php
$html = file_get_contents('data2.html');
$htmlDom = new DOMDocument;
$htmlDom->preserveWhiteSpace = false;
$htmlDom->loadHTML($html);
$tables = $htmlDom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
$cols = $row->getElementsByTagName('td');
echo @$cols->item(0)->nodeValue.'<br />';
echo @$cols->item(1)->nodeValue.'<br />';
echo trim($cols->item(1)->getElementsByTagName('a')->item(0)->getAttribute('href')).'<br />';
echo @$cols->item(2)->nodeValue.'<br />';
echo trim($cols->item(3)->getElementsByTagName('a')->item(0)->getAttribute('href')).'<br />';
}
?>
I get this error
Fatal error: Uncaught Error: Call to a member function getElementsByTagName() on null
getAttribute causes the error
Could someone help me out here please thanks
Upvotes: 1
Views: 557
Reputation: 7863
Your $rows
are results of "all the <tr>
within <table>
". It not only caught the <tr>
in the table body, it also caught that in your table head, which has no <td>
in it. Hence when reading that row, $cols->item(0)
and $cols->item(1)
both got you NULL
.
You should take the hint when your code didn't find ->nodeValue
attribute in the items (hence you added the @
sign to suppress the warning).
Try to change this:
$rows = $tables->item(0)->getElementsByTagName('tr');
into this:
$rows = $tables
->item(0)->getElementsByTagName('tbody')
->item(0)->getElementsByTagName('tr');
Now it is searching the <tr>
within your <tbody>
and should fix your issue with this particular HTML.
To have a more robust code, you should have checked the variables before acting on them. A type check or count check would be good.
Upvotes: 1
Reputation: 57121
As the previous access to the $cols
array all have @
to suppress the errors, this is the first one that complains.
A simple fix would be to just skip the rest of the code if no <td>
elements are found (such as the header row)...
foreach ($rows as $row)
{
$cols = $row->getElementsByTagName('td');
if ( count($cols) == 0 ) {
continue;
}
You could alternatively use XPath and only select <tr>
tags which contain <td>
tags.
Upvotes: 1