Reputation: 7694
Apify's legacy Crawler had a randomWaitBetweenRequests
option :
This option forces the crawler to ensure a minimum time interval between opening two web pages, in order to prevent it from overloading the target server.
Do Apify Actors have a similar setting ? If so, how does it impact the Actor Units computation ?
Upvotes: 0
Views: 1355
Reputation: 468
There are no options like that in apify/web-scraper, which should replace legacy crawler option.
But there is a way how to implement this yourself in pageFunction. You can simply use context.waitFor() function and pass there random time in ms.
async function pageFunction(context) {
const { request, log, jQuery } = context;
// To be able to use jQuery as $, one needs save it into a variable
// and select the inject jQuery option. We've selected it for you.
const $ = jQuery;
const title = $('title').text();
log.info(`URL: ${request.url} TITLE: ${title}`);
// This waits time in ms, which getRandomWait returns.
await context.waitFor(getRandomWait());
// To save data just return an object with the requested properties.
return {
url: request.url,
title
};
}
If you want to have this option in apify/web-scraper, you can submit an issue on GitHub repo.
Upvotes: 3