fitra
fitra

Reputation: 79

PHP DOM GET HREF ATTRIBUTE BETWEEN TABLE

I'm trying to get multiple href's from a table like this

<table class="table table-bordered table-hover">
   <thead>
      <tr>
         <th class="text-center">No</th>
         <th>TITLE</th>
         <th>DESCRIPTION</th>
         <th class="text-center"><span class="glyphicon glyphicon-download-alt"></span></th>
      </tr>
   </thead>
   <tbody>
    <tr data-key="11e44c4ebff985d08ca5313231363233">
       <td class="text-center" style="width: 50px;">181</td>
       <td style="width:auto; white-space: normal;"><a href="link-1.html">Link 1</a></td>
       <td style="width:auto; white-space: normal;">Lorem ipsum dolor 1</td>
       <td class="text-center" style="width: 50px;"><a href="link-1.pdf" title="Download" target="_blank"><img src="https://example.com/img/pdf.png" width="15" height="20" alt="myImage"></a></td>
    </tr>
    <tr data-key="11e44c4e4222d630bdd2313231323532">
       <td class="text-center" style="width: 50px;">180</td>
       <td style="width:auto; white-space: normal;"><a href="link-2.html">Link 2</a></td>
       <td style="width:auto; white-space: normal;">Lorem ipsum dolor 2</td>
       <td class="text-center" style="width: 50px;"><a href="link-2.pdf" title="Download" target="_blank"><img src="https://example.com/img/pdf.png" width="15" height="20" alt="myImage"></a></td>
    </tr>
    </tbody>
</table>

i try PHP DOM like this

<?php
$html = file_get_contents('data2.html');
 
$htmlDom = new DOMDocument;
$htmlDom->preserveWhiteSpace = false; 
$htmlDom->loadHTML($html);
$tables = $htmlDom->getElementsByTagName('table'); 
$rows = $tables->item(0)->getElementsByTagName('tr'); 

foreach ($rows as $row) 
  { 
      $cols = $row->getElementsByTagName('td'); 
      echo @$cols->item(0)->nodeValue.'<br />'; 
      echo @$cols->item(1)->nodeValue.'<br />'; 
      echo trim($cols->item(1)->getElementsByTagName('a')->item(0)->getAttribute('href')).'<br />';
      echo @$cols->item(2)->nodeValue.'<br />'; 
      echo trim($cols->item(3)->getElementsByTagName('a')->item(0)->getAttribute('href')).'<br />';
   } 
?>

I get this error

Fatal error: Uncaught Error: Call to a member function getElementsByTagName() on null

getAttribute causes the error

Could someone help me out here please thanks

Upvotes: 1

Views: 557

Answers (2)

Koala Yeung
Koala Yeung

Reputation: 7863

Your $rows are results of "all the <tr> within <table>". It not only caught the <tr> in the table body, it also caught that in your table head, which has no <td> in it. Hence when reading that row, $cols->item(0) and $cols->item(1) both got you NULL.

You should take the hint when your code didn't find ->nodeValue attribute in the items (hence you added the @ sign to suppress the warning).

Try to change this:

$rows = $tables->item(0)->getElementsByTagName('tr'); 

into this:

$rows = $tables
        ->item(0)->getElementsByTagName('tbody')
        ->item(0)->getElementsByTagName('tr');

Now it is searching the <tr> within your <tbody> and should fix your issue with this particular HTML.

To have a more robust code, you should have checked the variables before acting on them. A type check or count check would be good.

Upvotes: 1

Nigel Ren
Nigel Ren

Reputation: 57121

As the previous access to the $cols array all have @ to suppress the errors, this is the first one that complains.

A simple fix would be to just skip the rest of the code if no <td> elements are found (such as the header row)...

foreach ($rows as $row)
{
    $cols = $row->getElementsByTagName('td');
    if ( count($cols) == 0 )    {
        continue;
    }

You could alternatively use XPath and only select <tr> tags which contain <td> tags.

Upvotes: 1

Related Questions