Reputation: 55
Getting data from 1 page is simple, but how to go back after getting data from first page, enter a new page, get data from that page .. etc. I am trying to do this on a website http://books.toscrape.com/.
So, I chose to print how many books are in Stock because it can only be accessed if you enter the link. For example, if you run the code you will get: { stock: 'In stock (22 available)' }
Now, I wish to go back to the original page, enter the second link and take the same information as the previous one. And so on..
How can this be done using vanilla JavaScript?
const puppeteer = require('puppeteer');
let scrape = async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('http://books.toscrape.com/');
await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
await page.waitFor(1000);
const result = await page.evaluate(() => {
let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;
return {
stock
}
});
browser.close();
return result;
};
scrape().then((value) => {
console.log(value); // Success!
});
Upvotes: 4
Views: 6217
Reputation: 25280
What you need to do is call page.goBack()
to go back one page when your task is finished and then click the next element. For this you should use page.$$
to get the list of the clickable elements and use a loop to step over them one after another. Then you can re-run your script to extract the same information for the next page.
I adapted your code to print out your desired result in the console for each page below. Be aware that I changed the selector from your question to remove the :nth-child(1)
to select all clickable elements.
const puppeteer = require('puppeteer');
const elementsToClickSelector = '#default > div > div > div > div > section > div:nth-child(2) > ol > li > article > div.image_container > a > img';
let scrape = async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('http://books.toscrape.com/');
// get all elements to be clicked
let elementsToClick = await page.$$(elementsToClickSelector);
console.log(`Elements to click: ${elementsToClick.length}`);
for (let i = 0; i < elementsToClick.length; i++) {
// click element
elementsToClick[i].click();
await page.waitFor(1000);
// generate result for the current page
const result = await page.evaluate(() => {
let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;
return { stock };
});
console.log(result); // do something with the result here...
// go back one page and repopulate the elements
await page.goBack();
elementsToClick = await page.$$(elementsToClickSelector);
}
browser.close();
};
scrape();
Upvotes: 6