Reputation: 681
I'm using puppeteer for the following:
I switched await link.click(".ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a",);
for await new.page('...')
but it says that it can't find the a
.
This is the page that I'm scraping but notice the Load More button at the bottom of the page.
https://www.bodybuilding.com/exercises/finder
To prevent resetting the Load more button I want to open each new in a new tab, scrape, close tab and go to the next name.
How can I open each link in a new tab, close, and go to the previous tab?
My code:
var buttonExists = true;
let allData = [];
while (buttonExists == true) {
// const loadMore = true;
const rowsCounts = await page.$$eval(
'.ExCategory-results > .ExResult-row',
(rows) => rows.length
);
console.log(`row counts = ${rowsCounts}`);
for (let i = 2; i < rowsCounts + 1; i++) {
const exerciseName = await page.$eval(
`.ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a`,
(el) => el.innerText
);
console.log(`Exercise = ${exerciseName}`);
await link.click(`.ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a`,);
await page.waitForSelector('#js-ex-content');
... fancy code here
await page.goBack();
let obj = {
exercise: exerciseName,
};
allData.push(obj);
}
// clicking load more button and waiting 1sec
try {
await page.click(LoadMoreButton);
}
catch (err) {
buttonExists = false;
}
await page.waitForTimeout(1000);
}
Upvotes: 0
Views: 615
Reputation: 8352
This selector: .ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a
is unnecessarily long and it gives you not completely correct results.
To get to these elements:
this selector should be enough: .ExResult-row .ExHeading > a
.
Then you asked:
I want to open each new in a new tab, scrape, close tab and go to the next name.
and
How can I open each link in a new tab, close, and go to the previous tab?
In Puppeteer you can create a new page like so: await browser.newPage();
, so you can do it many times and store the pages into an array:
let pages = [];
pages.push(await browser.newPage());
then you get the links:
const links = await pages[0].$$eval(
'.ExResult-row .ExHeading > a',
links => links.map(l => l.getAttribute('href'))
);
and finally to create a new page for each link, scrape what you need, and close the page:
for (let link of links) {
pages.push(await browser.newPage());
await pages[pages.length - 1].goto(`${baseUrl}/${link}`);
// your scraping
await pages[pages.length - 1].close();
}
If you need to look up more, refer to the API documentation Puppeteer provides.
Upvotes: 1