Arran
Arran

Reputation: 154

Goutte extract text with tags

While trying to learn and use Goutte to scrape websites for descriptions, it does retrieve text but removes all tags (i.e. <br><b>). Is there a way to retrieve the values of all text within the div, including html tags? Or is there an easier alternative way that does give me this ability?

    <?php 
            require_once "vendor/autoload.php";
            use Goutte\Client;

            // Init. new client
            $client = new Client();
            $crawler = $client->request('GET', "examplesite.com/example");

            // Crawl response
            $description = $crawler->filter('element.class')->extract('_text');
    ?>

Upvotes: 3

Views: 3360

Answers (1)

Yoann
Yoann

Reputation: 5077

You can use the html() frunction

http://api.symfony.com/4.0/Symfony/Component/DomCrawler/Crawler.html#method_html

Like this

$descriptions = $crawler->filter('element.class')->each(function($node) {
    return $node->html();
})

After you can use strip_tags PHP function to clean it up

http://php.net/manual/fr/function.strip-tags.php

Upvotes: 3

Related Questions