Reputation: 120
Example HTML 5 for parsing:
<div id="orderDetails">
<div> ... any number of blocks with unnecessary stuff ... </div>
<div>Label for important info</div>
<table> ... some other block type ... </table>
<div>Some very important info here</div>
<div> ... any number of blocks with unnecessary stuff ... </div>
</div>
My PHP code looks like this:
$crawler = new \Symfony\Component\DomCrawler\Crawler($html);
$label = $crawler->filter('#orderDetails div:contains("Label for important info")');
$info = $label->parent()->next('div');
assert('Some very important info here' === $info->text(), 'Important info must be grabbed from HTML');
But unfortunately crawler has no methods parent
and next
. But.. it has parents
that gives me all parent nodes == all div's that i cannot differ.
So i have two questions in this case:
next
/prev
?Thanks.
Upvotes: 4
Views: 5751
Reputation: 120
After some digging into source code, i've found that method nextAll()
returns not "all" but just "one" node ($node = $this->getNode(0);
).
That means if i need "two nodes after current", then i must write $node->nextAll()->nextAll()->nextAll()
.
WTF?! This is super strange naming convention (0_0).
- How to get parent of current node? Not all nodes but "actual" one!
// This is only one parent node
$parent = $node->parents();
- How to traverse dom horizontally with some analogue of next/prev?
// This is only one node – next after current
$next = $node->nextAll();
// This is only one node – previous before current
$prev = $node->nextAll();
// This is only one node – next after two from current
$nextAfterTwo = $node->nextAll()->nextAll()->nextAll();
So, as needed implementation really exists, function-solution to question looks like this:
/**
* Returns sibling node that is after current and filtered with selector
*
* @param Crawler $start Node from which start traverse
* @param string $selector CSS/XPath selector like in `Crawler::filter($selector)`
*
* @return Crawler Found node wrapped with Crawler
*
* @throws \InvalidArgumentException When node not found
*/
function getNextFiltered(Crawler $start, string $selector) : Crawler
{
$count = $start->parents()->count();
$next = $start->nextAll();
while ($count --> 0) {
$filtered = $next->filter($selector);
if ($filtered->count()) return $filtered;
$next = $next->nextAll();
}
throw new \InvalidArgumentException('No node found');
}
And in my example:
$crawler = new Crawler($html);
$label = $crawler->filter('#orderDetails div:contains("Label for important info")');
$info = getNextFiltered($label, 'div');
assert('Some very important info here' === $info->text(), 'Important info must be grabbed from HTML');
Upvotes: 7