Reputation: 1213
I'm trying to scrape some content from a site. I eventually discovered that it requires cookies, so I solved that with the guzzle cookie plugin. It's strange because I cannot get the content from doing a var_dump, but it will show the page if I do 'echo' which makes me think there is some dynamic data call, which gets the data. I'm quite used to api with guzzle but not sure I should treat this?, thanks
If I use domcrawler I get an error.
Code -
use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use Symfony\Component\DomCrawler\Crawler;
use Guzzle\Http\Client;
use Guzzle\Plugin\Cookie\CookiePlugin;
use Guzzle\Plugin\Cookie\CookieJar\ArrayCookieJar;
$cookiePlugin = new CookiePlugin(new ArrayCookieJar());
$url = 'http://www.myurl.com';
// Add the cookie plugin to a client
$client = new Client();
$client->get();
$client->addSubscriber($cookiePlugin);
// Send the request with no cookies and parse the returned cookies
$client->get($url)->send();
// Send the request again, noticing that cookies are being sent
$request = $client->get($url);
$response = $request->send();
var_dump($response);
$crawler = new Crawler($response);
foreach ($crawler as $domElement) {
print $domElement->filter('a')->links();
}
error
Expecting a DOMNodeList or DOMNode instance, an array, a
string, or null, but got "Guzzle\Http\Message\Response
Upvotes: 1
Views: 2763
Reputation: 2047
If you instantiate your crawler object like $crawler = new Crawler($response);
you will receive all kinds of Uri based errors when you attempt to use any of the Form or Link based functions / features of the Crawler
object.
I recommend instantiating your Crawler
object like:
$crawler = new Symfony\Component\DomCrawler\Crawler(null, $response->getEffectiveUrl());
$crawler->addContent(
$response->getBody()->__toString(),
$response->getHeader('Content-Type')
);
This is also how the Symfony\Component\BrowswerKit\Client
does it within the createCrawlerFromContent method. The Symfony\Component\Browerkit\Client
is used internally by Goutte.
Upvotes: 1
Reputation: 4310
Try this:
$crawler = new Crawler($response->getBody()->getContents());
http://docs.guzzlephp.org/en/latest/http-messages.html#id2 http://docs.guzzlephp.org/en/latest/streams.html#creating-streams
$crawler = new Crawler($response->getBody());
http://guzzle3.readthedocs.org/http-client/response.html#response-body
Update
Basic usage of Guzzle 5 with getContents method.
include 'vendor/autoload.php';
use GuzzleHttp\Client;
$client = new Client();
echo $client->get('http://stackoverflow.com')->getBody()->getContents();
The rest is in doc (including cookie).
Upvotes: 4