user9746492
user9746492

Reputation: 55

Puppeteer: Open a page, get the data, go back to the previous page, enter a new page to get data

Getting data from 1 page is simple, but how to go back after getting data from first page, enter a new page, get data from that page .. etc. I am trying to do this on a website http://books.toscrape.com/.

So, I chose to print how many books are in Stock because it can only be accessed if you enter the link. For example, if you run the code you will get: { stock: 'In stock (22 available)' }

Now, I wish to go back to the original page, enter the second link and take the same information as the previous one. And so on..

How can this be done using vanilla JavaScript?

const puppeteer = require('puppeteer');

let scrape = async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('http://books.toscrape.com/');
    await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
    await page.waitFor(1000);

    const result = await page.evaluate(() => {
        let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;

        return {
            stock
        }
    });

    browser.close();
    return result;
};

scrape().then((value) => {
    console.log(value); // Success!
});

Upvotes: 4

Views: 6217

Answers (1)

Thomas Dondorf
Thomas Dondorf

Reputation: 25280

Explanation

What you need to do is call page.goBack() to go back one page when your task is finished and then click the next element. For this you should use page.$$ to get the list of the clickable elements and use a loop to step over them one after another. Then you can re-run your script to extract the same information for the next page.

Code

I adapted your code to print out your desired result in the console for each page below. Be aware that I changed the selector from your question to remove the :nth-child(1) to select all clickable elements.

const puppeteer = require('puppeteer');

const elementsToClickSelector = '#default > div > div > div > div > section > div:nth-child(2) > ol > li > article > div.image_container > a > img';

let scrape = async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('http://books.toscrape.com/');

    // get all elements to be clicked
    let elementsToClick = await page.$$(elementsToClickSelector);
    console.log(`Elements to click: ${elementsToClick.length}`);

    for (let i = 0; i < elementsToClick.length; i++) {
        // click element
        elementsToClick[i].click();
        await page.waitFor(1000);

        // generate result for the current page
        const result = await page.evaluate(() => {
            let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;
            return { stock };
        });
        console.log(result); // do something with the result here...

        // go back one page and repopulate the elements
        await page.goBack();
        elementsToClick = await page.$$(elementsToClickSelector);
    }

    browser.close();
};

scrape();

Upvotes: 6

Related Questions