GAV
GAV

Reputation: 1213

Guzzle response can't be used with Domcrawler()

I'm trying to scrape some content from a site. I eventually discovered that it requires cookies, so I solved that with the guzzle cookie plugin. It's strange because I cannot get the content from doing a var_dump, but it will show the page if I do 'echo' which makes me think there is some dynamic data call, which gets the data. I'm quite used to api with guzzle but not sure I should treat this?, thanks

If I use domcrawler I get an error.

Code -

   use Symfony\Bundle\FrameworkBundle\Controller\Controller;

   use Symfony\Component\DomCrawler\Crawler;

   use Guzzle\Http\Client;

   use Guzzle\Plugin\Cookie\CookiePlugin;

   use Guzzle\Plugin\Cookie\CookieJar\ArrayCookieJar;

   $cookiePlugin = new CookiePlugin(new ArrayCookieJar());

     $url =  'http://www.myurl.com';
    // Add the cookie plugin to a client
     $client = new Client();

     $client->get();

    $client->addSubscriber($cookiePlugin);

  // Send the request with no cookies and parse the returned cookies
  $client->get($url)->send();

// Send the request again, noticing that cookies are being sent
  $request = $client->get($url);

  $response = $request->send();

 var_dump($response);
 $crawler = new Crawler($response);

  foreach ($crawler as $domElement) {
  print $domElement->filter('a')->links();
   }

error

    Expecting a DOMNodeList or DOMNode instance, an array, a   
  string,        or     null, but got "Guzzle\Http\Message\Response

Upvotes: 1

Views: 2763

Answers (2)

Shaun Bramley
Shaun Bramley

Reputation: 2047

If you instantiate your crawler object like $crawler = new Crawler($response); you will receive all kinds of Uri based errors when you attempt to use any of the Form or Link based functions / features of the Crawler object.

I recommend instantiating your Crawler object like:

$crawler = new Symfony\Component\DomCrawler\Crawler(null, $response->getEffectiveUrl());

$crawler->addContent(
    $response->getBody()->__toString(),
    $response->getHeader('Content-Type')
);

This is also how the Symfony\Component\BrowswerKit\Client does it within the createCrawlerFromContent method. The Symfony\Component\Browerkit\Client is used internally by Goutte.

Upvotes: 1

kba
kba

Reputation: 4310

Try this:

For Guzzle 5

$crawler = new Crawler($response->getBody()->getContents());

http://docs.guzzlephp.org/en/latest/http-messages.html#id2 http://docs.guzzlephp.org/en/latest/streams.html#creating-streams

For Guzzle 3

$crawler = new Crawler($response->getBody());

http://guzzle3.readthedocs.org/http-client/response.html#response-body

Update

Basic usage of Guzzle 5 with getContents method.

include 'vendor/autoload.php';

use GuzzleHttp\Client;

$client = new Client();
echo $client->get('http://stackoverflow.com')->getBody()->getContents();

The rest is in doc (including cookie).

Upvotes: 4

Related Questions