Nahar
Nahar

Reputation: 269

create new tab in puppeteer inside a loop cause Navigation timeout

Recently I am learning puppeteer using their docs and try to scrape some information.

First approach

First I collect a list of url from the mainpage. Second I create a new tab and go those url iterately and collect some data. I doubt when I enter the loop the new tab didn't work as I expect and freezed without giving any data. Eventually I got a error TimeoutError: Navigation timeout of 30000 ms exceeded. Is there any better approach?

(async () => {
  const browser = await puppeteer.launch({ headless: true });
  const mainpage = await browser.newPage();

  console.log('goto main page'.green);
  await mainpage.goto(mainURL);

  console.log('collecting some url'.green);
  const URLS = await mainpage.evaluate(() =>
    Array.from(
      document.querySelectorAll('.result-actions a'),
      (element) => element.href
    )
  );
  if (typeof URLS[0] === 'string') console.log('OK'.green);

  console.log('collecting finished'.green);

  const newTab= await browser.newPage();

  console.log('create new tab'.green);

  var data = [];

  for (let i = 0, n = URLS.length; i < n; i++) {
    //console.log(URLS[i]);

    // use this new tab to collect some data then close this tab
    // continue this process

    await newTab.waitForNavigation();
    await newTab.goto(URLS[i]);
    await newTab.waitForSelector('.profile-phone-column span a');
    console.log('Go each url using new tab'.green);

    // collecting data
    
    data.push(collected_data);
    // close this tab
    await collectNamePage.close();
    console.log(data);
  }
  await mainpage.close();
  await browser.close();
  console.log('closing browser'.green);
})();

Second approach

This time I want to skip the part where I collect those data using a new tab. Hence I collect my urls using page.$$() and try to iterating using for...of over urls and collect my data using elementHandle.$(selector) but this approach also failed.

I am getting frustrated. Am I doing it wrong way or I didn't understand their documentation?

Upvotes: 0

Views: 657

Answers (1)

vsemozhebuty
vsemozhebuty

Reputation: 13812

  1. In your script, you do not need newTab.waitForNavigation(); at all. Usually, this is used when the navigation is caused by some event. When you just use .goto(), the page loading is waited automatically.

  2. Even if you need waitForNavigation(), you usually do not await it before the navigation triggered, otherwise you just get the timeout. You await it with navigation trigger together:

    await Promise.all([element.click(),  page.waitForNavigation()]);
    

So try to just delete await newTab.waitForNavigation();.

Also, do not close the new tab in the loop, delete it after the loop.


Edited script:

const puppeteer = require('puppeteer');
const mainURL = 'https://www.psychologytoday.com/us/therapists/illinois/';

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const mainpage = await browser.newPage();

  console.log('goto main page');
  await mainpage.goto(mainURL);

  console.log('collecting urls');
  const URLS = await mainpage.evaluate(() =>
    Array.from(
      document.querySelectorAll('.result-actions a'),
      (element) => element.href
    )
  );
  if (typeof URLS[0] === 'string') console.log('OK');
  console.log('collection finished');

  const collectNamePage = await browser.newPage();

  console.log('create new tab');

  var data = [];

  for (let i = 0, totalUrls = URLS.length; i < totalUrls; i++) {
    console.log(URLS[i]);

    await collectNamePage.goto(URLS[i]);
    await collectNamePage.waitForSelector('.profile-phone-column span a');
    console.log('create new tab and go there');

    // collecting data
    const [name, phone] = await collectNamePage.evaluate(
      () => [
        document.querySelector('.profile-middle .name-title-column h1').innerText,
        document.querySelector('.profile-phone-column span a').innerText
      ]
    );
    data.push({ name, phone });
  }

  console.log(data);
  await collectNamePage.close();

  await mainpage.close();
  await browser.close();
  console.log('closing browser');
})();

Upvotes: 1

Related Questions