Reputation: 11240
I'm having a go scraping a page with simple_html_dom. On the page I'm scraping, there's a table with rows, and inside those, a bunch of cells. I'm wanting to get stuff in the third cell in each row. The cell in question doesn't have a class.
<tr class="thisrow">
<td class="firstcell"><strong>1st</strong></td>
<td class="secondcell">nothing in here</td>
<td><strong>blah blah</strong></td>
<td>something else</td>
</tr>
So to get started, I went straight for the third cell:
foreach($html->find('tr.thisrow td:nth-child(3)') as $thirdcell) {
echo $thirdcell->innertext // this works, no problem!
}
But then I realised I needed some data in another cell in the row (td.firstcell). This cell has a class, so I thought best to loop through the rows, then use selectors within the context of that row:
foreach($html->find('tr.thisrow') as $row) {
$thirdcell = $row->find('td:nth-child(3)');
echo $thirdcell; // this is now empty
$firstcell = $row->find('td.firstcell');
echo $firstcell; // this works!
}
So as you can see, my nth-child selector suddenly inside the context of the row loop is not working. What am I missing?
Upvotes: 0
Views: 89
Reputation: 350310
It is a limitation of simple html dom
. Apparently it can deal with nth-child
selectors, but only when the parent is in the tree below the node on which you apply find
.
But it is a valid selector, as the equivalent JavaScript shows:
for (var row of [...document.querySelectorAll('tr.thisrow')]) {
var thirdcell = row.querySelectorAll('td:nth-child(3)');
console.log(thirdcell[0].textContent); // this works!
}
<table border=1>
<tr class="thisrow">
<td class="firstcell"><strong>1st</strong></td>
<td class="secondcell">nothing in here</td>
<td><strong>blah blah</strong></td>
<td>something else</td>
</tr>
</table>
As a workaround you could use the array index on the find('td')
result:
foreach($html->find('tr.thisrow') as $row) {
$thirdcell = $row->find('td');
echo $thirdcell[2]; // this works
}
Or, alternatively with children
, as td
are direct children of tr
:
foreach($html->find('tr.thisrow') as $row) {
$thirdcell = $row->children();
echo $thirdcell[2]; // this works
}
Upvotes: 2
Reputation: 2361
you can use children($int)
method. $int
start with 0
.
try this :
$row = $html->find('tr.thisrow',0);
$firstcell = $row->children(2)->innertext;
$thirdcell = $row->children(0)->innertext;
also you have : first_child ()
, last_child()
, parent()
,next_sibling()
,prev_sibling()
Upvotes: 1