avocado
avocado

Reputation: 2735

Puppeteer parallel scraping via multiple pages

I wanted to scrape multiple urls simultaneously, so I used p-queue to implement a Promise-queue.

For example, see the code below, uses 1 browser and multiple pages to do this job.

const queue = new PQueue({
    concurrency: 5
});

(
    async () => {
        let instance = await pptr.launch({
            headless: false,
        });

        // task processor function
        const createInstance = async (url) => {
            let page = await instance.newPage();
            await page.goto(email);

            // (PROBLEM) more operations go here
            ...

            return await page.close();
        }

        // add tasks to queue
        for (let url of urls) {
            queue.add(async () => createInstance(url))
        } 
    }
)()

The problem is that, indeed multiple urls could be open at the same time via multiple pages, but looks like only the one (and only one) page focused by the browser will continue doing the operations (see the above code more operations go here section), the other pages (or tabs) just stop working unless I click on that page to focus on it.

So is there any workaround to run all the pages simultaneously?

Upvotes: 6

Views: 7891

Answers (1)

avocado
avocado

Reputation: 2735

I found why the above code didn't work, I shouldn't await instance outside of the worker function, but await inside, see below,

(
    async () => {
        let instance = pptr.launch({  // don't await here
            headless: false,
        });

        // task processor function
        const createInstance = async (url) => {
            let real_instance = await instance;  // await here
            let page = await real_instance.newPage();
            await page.goto(email);

            // (PROBLEM) more operations go here
            ...

            return await page.close();
        }

        // add tasks to queue
        for (let url of urls) {
            queue.add(async () => createInstance(url))
        } 
    }
)()

Upvotes: 5

Related Questions