trogwar
trogwar

Reputation: 120

How to get next node after parent of current via symfony crawler?

Example HTML 5 for parsing:

<div id="orderDetails">
    <div> ... any number of blocks with unnecessary stuff ... </div>
    <div>Label for important info</div>
    <table> ... some other block type ... </table>
    <div>Some very important info here</div>
    <div> ... any number of blocks with unnecessary stuff ... </div>
</div>

My PHP code looks like this:

$crawler = new \Symfony\Component\DomCrawler\Crawler($html);
$label = $crawler->filter('#orderDetails div:contains("Label for important info")');
$info = $label->parent()->next('div');
assert('Some very important info here' === $info->text(), 'Important info must be grabbed from HTML');

But unfortunately crawler has no methods parent and next. But.. it has parents that gives me all parent nodes == all div's that i cannot differ.

So i have two questions in this case:

  1. How to get parent of current node? Not all nodes but "actual" one!
  2. How to traverse dom horizontally with some analogue of next/prev?

Thanks.

Upvotes: 4

Views: 5751

Answers (1)

trogwar
trogwar

Reputation: 120

Story

After some digging into source code, i've found that method nextAll() returns not "all" but just "one" node ($node = $this->getNode(0);).

That means if i need "two nodes after current", then i must write $node->nextAll()->nextAll()->nextAll().

WTF?! This is super strange naming convention (0_0).

Answers

  1. How to get parent of current node? Not all nodes but "actual" one!
// This is only one parent node
$parent = $node->parents();
  1. How to traverse dom horizontally with some analogue of next/prev?
// This is only one node – next after current
$next = $node->nextAll();
// This is only one node – previous before current
$prev = $node->nextAll();
// This is only one node – next after two from current
$nextAfterTwo = $node->nextAll()->nextAll()->nextAll();

Concrete code solution

So, as needed implementation really exists, function-solution to question looks like this:

/**
 * Returns sibling node that is after current and filtered with selector
 *
 * @param Crawler $start    Node from which start traverse
 * @param string  $selector CSS/XPath selector like in `Crawler::filter($selector)`
 *
 * @return Crawler Found node wrapped with Crawler
 *
 * @throws \InvalidArgumentException When node not found
 */
function getNextFiltered(Crawler $start, string $selector) : Crawler
{
    $count = $start->parents()->count();
    $next  = $start->nextAll();
    while ($count --> 0) {
        $filtered = $next->filter($selector);
        if ($filtered->count()) return $filtered;
        $next = $next->nextAll();
    }

    throw new \InvalidArgumentException('No node found');
}

And in my example:

$crawler = new Crawler($html);
$label   = $crawler->filter('#orderDetails div:contains("Label for important info")');
$info    = getNextFiltered($label, 'div');
assert('Some very important info here' === $info->text(), 'Important info must be grabbed from HTML');

Upvotes: 7

Related Questions