Mark888
Mark888

Reputation: 53

Goutte - dom crawler - remove node

I have html on my site (http://testsite.com/test.php) :

<div class="first">
  <div class="second">
     <a href="/test.php">click</a>
     <span>back</span>
  </div>
</div>
<div class="first">
  <div class="second">
     <a href="/test.php">click</a>
     <span>back</span>
  </div>
</div>

I would like receive:

<div class="first">
  <div class="second">
     <a href="/test.php">click</a>
  </div>
</div>
<div class="first">
  <div class="second">
     <a href="/test.php">click</a>
  </div>
</div>

So i would like remove span. I use Goutte in Symfony2 based on http://symfony.com/doc/current/components/dom_crawler.html :

    $client = new Client();
    $crawler = $client->request('GET', 'http://testsite.com/test.php');

    $crawler->filter('.first .second')->each(function ($node) {
        //??????
    });

Upvotes: 5

Views: 5610

Answers (2)

kaseOga
kaseOga

Reputation: 781

To remove a node the anonymous function must return false.

Just return false inside the reducer and the $node will be deleted.

Upvotes: -1

Jakub Zalas
Jakub Zalas

Reputation: 36191

As explained in the docs:

The DomCrawler component eases DOM navigation for HTML and XML documents.

and also:

While possible, the DomCrawler component is not designed for manipulation of the DOM or re-dumping HTML/XML.

DomCrawler is designed to extract details from DOM documents rather than modifying them.

However...

Since PHP passes objects by reference, and Crawler is basically a wrapper for DOMNodes, it's technically possible to modify the underlying DOM document:

// will remove all span nodes inside .second nodes
$crawler->filter('html .content h2')->each(function (Crawler $crawler) {
    foreach ($crawler as $node) {
        $node->parentNode->removeChild($node);
    }
});

Here's a working example: https://gist.github.com/jakzal/8dd52d3df9a49c1e5922

Upvotes: 5

Related Questions