Reputation: 2907
I want to extract data from HTML with this structure:
<html>
<body>
<table>
<tr>
<td>
<table>
<tr>
<td>
<table>
<tr>
<td>
<table>
<tr>
<td>TD1
<table>
<tr>
<td>TD2
<table>
<tr>
<td>TD3</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
I would like to get this text result once:
TD1 TD2 TD3
When I try with simple php dom
foreach($html->find('body + table + table + table + table') as $element)
echo $element->innertext . '<br>';
I get this result:
TD1 TD2 TD3
TD2 TD3
TD3
Seems like php Dom library doesn't work with the + css selector, so it founds the element "body + table + table + table + table" a few times and not only the immediate one body > table > table > table > table.
How could I get only the outer tags once so the result would be TD1 TD2 TD3 ? In the HTML in one page there are multiple times this structure in the same page, so I'm looking for something similar to the + CSS selector to get all occurrences of this element body + table + table + table + table in page.
Upvotes: 0
Views: 304
Reputation: 1447
You could try Symfony's DomCrawler component. It's filter()
method accepts CSS selectors (with a few minor exceptions, see here.)
Upvotes: 0