Retry failed pages with new proxyUrl

Question

I have developed an Actor+PuppeteerCrawler+Proxy based crawler and want to rescrape failed pages. To increase the chance for the rescrape, I want to switch to another proxyUrl. The idea is, to create a new crawler with a modified launchPupperteer function and a different proxyUrl, and re-enque the failed pages. Please check the sample code below.

But unfortunately, it doesn't work, although I reset the request queue by using drop and reopening. Is it possible to rescraped failed pages by using PuppeteerCrawler with a different proxyUrl and how?

Best regards, Wolfgang

for(let retryCount = 0; retryCount <= MAX_RETRY_COUNT; retryCount++){

    if(retryCount){
        // Try to reset the request queue, so that failed request shell be rescraped
        await requestQueue.drop();
        requestQueue = await Apify.openRequestQueue();   // this is necessary to avoid exceptions
        // Re-enqueue failed urls in array failedUrls >>> ignored although using drop() and reopening request queue!!!
        for(let failedUrl of failedUrls){
            await requestQueue.addRequest({url: failedUrl});
        }
    }

    crawlerOptions.launchPuppeteerFunction = () => {
        return Apify.launchPuppeteer({
            // generates a new proxy url and adds it to a new launchPuppeteer function
            proxyUrl: createProxyUrl()
        });
    };

    let crawler = new Apify.PuppeteerCrawler(crawlerOptions);
    await crawler.run();

}

Retry failed pages with new proxyUrl

Answers (1)

Related Questions