Reputation: 55
I'm working on a Node.js project where I need to run multiple Puppeteer instances in parallel to scrape data from a website. I've read that using worker_threads can help achieve this by running tasks concurrently, but I'm having trouble getting it to work correctly.
Here's what I've done so far:
However, only one worker seems to execute correctly, and the others only start if I manually interact with them. Additionally, I encounter timeout errors, which suggests that the workers are not being managed properly.
const browser = await puppeteer.launch({
headless: false,
devtools: false,
channel: 'chrome',
});
const wsEndpoint = browser.wsEndpoint();
for (let i = 0; i < 4; i++) {
new Worker(WORKER_PATH, {
workerData: {
wsEndpoint,
url: '<some url>',
},
);
}
worker.js
import { connect } from 'puppeteer-core';
(async () => {
try {
const { wsEndpoint, url } = workerData as {
wsEndpoint: string;
url: string;
};
const browser = await connect({
browserWSEndpoint: wsEndpoint,
defaultViewport: null,
});
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
await page.setViewport({ width: 1440, height: 1024 });
// some code`
Upvotes: 0
Views: 168