MITHU
MITHU

Reputation: 154

Trouble clicking on different links using puppeteer

I've written tiny scripts in node using puppeteer to perform clicks cyclically on the link of different posts from it's landing page of a website.

The site link used within my scripts is a placeholder. Moreover, they are not dynamic. So, puppeteer might be overkill. However, My intention is to learn the logic of clicking.

When I execute my first script, It clicks once and throws the following error as it goes out of the source.

const puppeteer = require("puppeteer");

(async () => {
    const browser = await puppeteer.launch({headless:false});
    const [page] = await browser.pages();
    await page.goto("https://stackoverflow.com/questions/tagged/web-scraping",{waitUntil:'networkidle2'});
    await page.waitFor(".summary");
    const sections = await page.$$(".summary");

    for (const section of sections) {
        await section.$eval(".question-hyperlink", el => el.click())
    }

    await browser.close();
})();

The error the above script encounters:

(node:9944) UnhandledPromiseRejectionWarning: Error: Execution context was destroyed, most likely because of a navigation.

When I execute the following, the script pretends to click once (in reality it is not) and encounters the same error as earlier.

const puppeteer = require("puppeteer");

(async () => {
    const browser = await puppeteer.launch({headless:false});
    const [page] = await browser.pages();
    await page.goto("https://stackoverflow.com/questions/tagged/web-scraping");

    await page.waitFor(".summary .question-hyperlink");
    const sections = await page.$$(".summary .question-hyperlink");

    for (let i=0, lngth = sections.length; i < lngth; i++) {
        await sections[i].click();
    }

    await browser.close();
})();

The error the above one throws:

(node:10128) UnhandledPromiseRejectionWarning: Error: Execution context was destroyed, most likely because of a navigation.

How can I let my script perform clicks cyclically?

Upvotes: 2

Views: 5508

Answers (2)

robots.txt
robots.txt

Reputation: 137

Instead of clicking all the links cyclically, I find it better to parse all the links and then navigate to each of them reusing the same browser. Give it a shot:

const puppeteer = require("puppeteer");

(async () => {
    const browser = await puppeteer.launch({headless:false});
    const [page] = await browser.pages();
    const base = "https://stackoverflow.com"
    await page.goto("https://stackoverflow.com/questions/tagged/web-scraping");
    let links = [];
    await page.waitFor(".summary .question-hyperlink");
    const sections = await page.$$(".summary .question-hyperlink");

    for (const section of sections) {
        const clink = await page.evaluate(el=>el.getAttribute("href"), section);
        links.push(`${base}${clink}`);
    }

    for (const link of links) {
        await page.goto(link);
        await page.waitFor('h1 > a');
    }
    await browser.close();
})();

Upvotes: 0

Md. Abu Taher
Md. Abu Taher

Reputation: 18816

Problem:

Execution context was destroyed, most likely because of a navigation.

The error says you wanted to click some link, or do something on some page which does not exist anymore, most likely because of you navigated away.

Logic:

Think of the puppeteer script as a real human browsing the real page.

First, we load the url (https://stackoverflow.com/questions/tagged/web-scraping).

Next, we want to go through all questions asked on that page. To do that what would we normally do? We would do either of the following,

  • Open one link in a new tab. Focus on that new tab, finish our work and come back to the original tab. Continue next link.
  • We click on a link, do our stuff, go back to previous page, continue next one.

So both of them involves moving away from and coming back to current page.

If you don't follow this flow, you will get the error message as above.

Solution

There are at least 4 or more ways to resolve this. I will go with the simplest and complex ones.

Way: Link Extraction

First we extract all links on current page.

const links = await page.$$eval(".hyperlink", element => element.href);

This gives us a list of url. We can create a new tab for each link.

for(let link of links){
  const newTab = await browser.newPage();
  await newTab.goto(link);
  // do the stuff
  await newTab.close();
}

This will go through each link one by one. We could improve this by using promise.map and various queue libraries, but you get the idea.

Way: Coming back to main page

We will need to store the state somehow so we can know which link we visited last time. If we visited third question and came back to tag page, we need to visit the 4th question next time and vice versa.

Check the following code.

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();

  await page.goto(
    `https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&pagesize=15`
  );

  const visitLink = async (index = 0) => {
    await page.waitFor("div.summary > h3 > a");

    // extract the links to click, we need this every time
    // because the context will be destryoed once we navigate
    const links = await page.$$("div.summary > h3 > a");
    // assuming there are 15 questions on one page,
    // we will stop on 16th question, since that does not exist
    if (links[index]) {
      console.log("Clicking ", index);

      await Promise.all([

        // so, start with the first link
        await page.evaluate(element => {
          element.click();
        }, links[index]),

        // either make sure we are on the correct page due to navigation
        await page.waitForNavigation(),
        // or wait for the post data as well
        await page.waitFor(".post-text")
      ]);

      const currentPage = await page.title();
      console.log(index, currentPage);

      // go back and visit next link
      await page.goBack({ waitUntil: "networkidle0" });
      return visitLink(index + 1);
    }
    console.log("No links left to click");
  };

  await visitLink();

  await browser.close();
})();

Result: enter image description here

EDIT: There are multiple questions similar to this one. I will be referencing them in case you want to learn more.

Upvotes: 7

Related Questions