Reputation: 168
I've been working on a script that makes close to a thousand async requests using getAsync and Promise\Settle. Each page requested it then parsed using Symphony crawler filter method (Also slow but a separate issue.)
My code looks something like this:
$requestArray = [];
$request = new Client($url);
foreach ($thousandItemArray as $item) {
$requestArray[] = $request->getAsync(null, $query);
}
$results = Promise\settle($request)->wait(true);
foreach ($results as $item) {
$item->crawl();
}
Is there a way I can crawl the requested pages as they come in rather than waiting for them all and then crawling. Am i right in thinking this would speed things up if possible?
Thanks for your help in advance.
Upvotes: 0
Views: 1152
Reputation: 5010
You can. getAsync()
returns a promise, so you can assign an action to it using ->then()
.
$promisesList[] = $request->getAsync(/* ... */)->then(
function (Response $resp) {
// Do whatever you want right after the response is available.
}
);
$results = Promise\settle($request)->wait(true);
P.S.
Probably you want to limit the concurrency level to some number of requests (not to start all the requests at once). If yes, use each_limit()
function instead of settle
. And vote for my PR to be able to use settle_limit()
;)
Upvotes: 2