Skippy le Grand Gourou
Skippy le Grand Gourou

Reputation: 7694

Delay between requests in Apify

Apify's legacy Crawler had a randomWaitBetweenRequests option :

This option forces the crawler to ensure a minimum time interval between opening two web pages, in order to prevent it from overloading the target server.

Do Apify Actors have a similar setting ? If so, how does it impact the Actor Units computation ?

Upvotes: 0

Views: 1355

Answers (1)

drobnikj
drobnikj

Reputation: 468

There are no options like that in apify/web-scraper, which should replace legacy crawler option.

But there is a way how to implement this yourself in pageFunction. You can simply use context.waitFor() function and pass there random time in ms.

async function pageFunction(context) {
    const { request, log, jQuery } = context;

    // To be able to use jQuery as $, one needs save it into a variable
    // and select the inject jQuery option. We've selected it for you.
    const $ = jQuery;
    const title = $('title').text();

    log.info(`URL: ${request.url} TITLE: ${title}`);

    // This waits time in ms, which getRandomWait returns.
    await context.waitFor(getRandomWait());

    // To save data just return an object with the requested properties.
    return {
        url: request.url,
        title
    };
}

If you want to have this option in apify/web-scraper, you can submit an issue on GitHub repo.

Upvotes: 3

Related Questions