Reputation: 2735
I wanted to scrape multiple urls simultaneously, so I used p-queue
to implement a Promise
-queue.
For example, see the code below, uses 1 browser and multiple pages to do this job.
const queue = new PQueue({
concurrency: 5
});
(
async () => {
let instance = await pptr.launch({
headless: false,
});
// task processor function
const createInstance = async (url) => {
let page = await instance.newPage();
await page.goto(email);
// (PROBLEM) more operations go here
...
return await page.close();
}
// add tasks to queue
for (let url of urls) {
queue.add(async () => createInstance(url))
}
}
)()
The problem is that, indeed multiple urls could be open at the same time via multiple pages, but looks like only the one (and only one) page focused by the browser will continue doing the operations (see the above code more operations go here
section), the other pages (or tabs) just stop working unless I click on that page to focus on it.
So is there any workaround to run all the pages simultaneously?
Upvotes: 6
Views: 7891
Reputation: 2735
I found why the above code didn't work, I shouldn't await instance
outside of the worker function, but await
inside, see below,
(
async () => {
let instance = pptr.launch({ // don't await here
headless: false,
});
// task processor function
const createInstance = async (url) => {
let real_instance = await instance; // await here
let page = await real_instance.newPage();
await page.goto(email);
// (PROBLEM) more operations go here
...
return await page.close();
}
// add tasks to queue
for (let url of urls) {
queue.add(async () => createInstance(url))
}
}
)()
Upvotes: 5