Michael
Michael

Reputation: 524

How to use DOMDocument to get child elements?

I am trying to get the text of child elements using the PHP DOM.

Specifically, I am trying to get only the first <a> tag within every <tr>.

The HTML is like this...

<table>
<tbody>
    <tr>
        <td>
            <a href="#">1st Link</a>
        </td>
        <td>
            <a href="">2nd Link</a>
        </td>
        <td>
            <a href="#">3rd Link</a>
        </td>
    </tr>

    <tr>
        <td>
            <a href="#">1st Link</a>
        </td>
        <td>
            <a href="#">2nd Link</a>
        </td>
        <td>
            <a href="#">3rd Link</a>
        </td>
    </tr>
</tbody>
</table>

My sad attempt at it involved using foreach() loops, but would only return Array() when doing a print_r() on the $aVal.

$dom = new DOMDocument();
libxml_use_internal_errors(true);       
$dom->loadHTML(returnURLData($url));
libxml_use_internal_errors(false);
    
$tables = $dom->getElementsByTagName('table');
$aVal = array();

foreach ($tables as $table) {
    foreach ($table as $tr){
        $trVal = $tr->getElementsByTagName('tr');
        foreach ($trVal as $td){
            $tdVal = $td->getElementsByTagName('td');
            foreach($tdVal as $a){
                $aVal[] = $a->getElementsByTagName('a')->nodeValue;
            }
        }
    }
}

Am I on the right track or am I completely off?

Upvotes: 1

Views: 5767

Answers (3)

Sarthak Sawhney
Sarthak Sawhney

Reputation: 432

I am pretty sure I am late, but better way should be to iterate through all "tr" with getElementByTagName and then while iterating through each node in nodelist recieved use getElementByTagName"a". Now no need to iterate through nodeList point out the first element recieved by item(0). That's it! Another way can be to use xPath.

I personally don't like SimpleHtmlDom because of the loads of extra added features it uses where a small functionality is required. In case of heavy scraping also memory management issue can hold you back, its better if you yourself do DOM Analysis rather than depending thrid party application.

Just My opinion. Even I used SHD initially but later realized this.

Upvotes: 0

user2380429
user2380429

Reputation: 61

Put this code in test.php

require 'simple_html_dom.php';
$html = file_get_html('test1.php');
foreach($html->find('table tr') as $element)
{
    foreach($element->find('a',0) as $element)
    {
        echo $element->plaintext;
    }
}

and put your html code in test1.php

<table>
    <tbody>
        <tr>
            <td>
                <a href="#">1st Link</a>
            </td>
            <td>
                <a href="">2nd Link</a>
            </td>
            <td>
                <a href="#">3rd Link</a>
            </td>
        </tr>

        <tr>
            <td>
                <a href="#">1st Link</a>
            </td>
            <td>
                <a href="#">2nd Link</a>
            </td>
            <td>
                <a href="#">3rd Link</a>
            </td>
        </tr>
    </tbody>
</table>

Upvotes: 2

php_nub_qq
php_nub_qq

Reputation: 16025

You're not setting $trVal and $tdVal yet you're looping them ?

Upvotes: -1

Related Questions