Ilya
Ilya

Reputation: 55

How to Parallelize Puppeteer Tasks Using Worker Threads in Node.js?

I'm working on a Node.js project where I need to run multiple Puppeteer instances in parallel to scrape data from a website. I've read that using worker_threads can help achieve this by running tasks concurrently, but I'm having trouble getting it to work correctly.

Here's what I've done so far:

However, only one worker seems to execute correctly, and the others only start if I manually interact with them. Additionally, I encounter timeout errors, which suggests that the workers are not being managed properly.

const browser = await puppeteer.launch({
  headless: false,
  devtools: false,
  channel: 'chrome',
});
const wsEndpoint = browser.wsEndpoint();
for (let i = 0; i < 4; i++) {
  new Worker(WORKER_PATH, {
        workerData: {
          wsEndpoint,
          url: '<some url>',
        },
     );
}

worker.js

    import { connect } from 'puppeteer-core';

   (async () => {
  try {
    const { wsEndpoint, url } = workerData as {
      wsEndpoint: string;
      url: string;
    };
    const browser = await connect({
      browserWSEndpoint: wsEndpoint,
      defaultViewport: null,
    });
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: 'networkidle2' });
    await page.setViewport({ width: 1440, height: 1024 });
    // some code`

Upvotes: 0

Views: 168

Answers (0)

Related Questions