Reputation: 1558
I have the following method from my controller that gets the data from the site:
$goutteClient = new Client();
$guzzleClient = new GuzzleClient([
'timeout' => 60,
]);
$goutteClient->setClient($guzzleClient);
$crawler = $goutteClient->request('GET', 'https://html.duckduckgo.com/html/?q=Laravel');
$crawler->filter('.result__title .result__a')->each(function ($node) {
dump($node->text());
});
The above code gives me the title of contents from the search results. I also want to get the link of the corresponding search result. That resides in class result__extras__url
.
How do I filter the link in and the title at once? Or do I have to run another method for that?
Upvotes: 1
Views: 2651
Reputation: 3805
For parsing, I usually do the following:
$doc = new DOMDocument();
$doc->loadHTML((string)$crawler->getBody());
from then on, you can access using getElementsByTagName
functions on your DOMDocument.
for example:
$rows = $doc->getElementsByTagName('tr');
foreach ($rows as $row) {
$cols = $row->getElementsByTagName('td');
$value = trim($cols->item(0)->nodeValue);
}
You can find more information in https://www.php.net/manual/en/class.domdocument.php
Upvotes: 1
Reputation: 492
Try to inspect the attributes of the nodes. Once you get the href
attribute, parse it to get the URL.
$crawler->filter('.result__title .result__a')->each(function ($node) {
$parts = parse_url(urldecode($node->attr('href')));
parse_str($parts['query'], $params);
$url = $params['uddg']; // DDG puts their masked URL and places the actual URL as a query param.
$title = $node->text();
});
Upvotes: 1