Reputation: 524
I am trying to get the text of child elements using the PHP DOM.
Specifically, I am trying to get only the first <a>
tag within every <tr>
.
The HTML is like this...
<table>
<tbody>
<tr>
<td>
<a href="#">1st Link</a>
</td>
<td>
<a href="">2nd Link</a>
</td>
<td>
<a href="#">3rd Link</a>
</td>
</tr>
<tr>
<td>
<a href="#">1st Link</a>
</td>
<td>
<a href="#">2nd Link</a>
</td>
<td>
<a href="#">3rd Link</a>
</td>
</tr>
</tbody>
</table>
My sad attempt at it involved using foreach()
loops, but would only return Array()
when doing a print_r()
on the $aVal
.
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(returnURLData($url));
libxml_use_internal_errors(false);
$tables = $dom->getElementsByTagName('table');
$aVal = array();
foreach ($tables as $table) {
foreach ($table as $tr){
$trVal = $tr->getElementsByTagName('tr');
foreach ($trVal as $td){
$tdVal = $td->getElementsByTagName('td');
foreach($tdVal as $a){
$aVal[] = $a->getElementsByTagName('a')->nodeValue;
}
}
}
}
Am I on the right track or am I completely off?
Upvotes: 1
Views: 5767
Reputation: 432
I am pretty sure I am late, but better way should be to iterate through all "tr" with getElementByTagName and then while iterating through each node in nodelist recieved use getElementByTagName"a". Now no need to iterate through nodeList point out the first element recieved by item(0). That's it! Another way can be to use xPath.
I personally don't like SimpleHtmlDom because of the loads of extra added features it uses where a small functionality is required. In case of heavy scraping also memory management issue can hold you back, its better if you yourself do DOM Analysis rather than depending thrid party application.
Just My opinion. Even I used SHD initially but later realized this.
Upvotes: 0
Reputation: 61
Put this code in test.php
require 'simple_html_dom.php';
$html = file_get_html('test1.php');
foreach($html->find('table tr') as $element)
{
foreach($element->find('a',0) as $element)
{
echo $element->plaintext;
}
}
and put your html code in test1.php
<table>
<tbody>
<tr>
<td>
<a href="#">1st Link</a>
</td>
<td>
<a href="">2nd Link</a>
</td>
<td>
<a href="#">3rd Link</a>
</td>
</tr>
<tr>
<td>
<a href="#">1st Link</a>
</td>
<td>
<a href="#">2nd Link</a>
</td>
<td>
<a href="#">3rd Link</a>
</td>
</tr>
</tbody>
</table>
Upvotes: 2
Reputation: 16025
You're not setting $trVal
and $tdVal
yet you're looping them ?
Upvotes: -1