Bruce Mathers
Bruce Mathers

Reputation: 681

Open a link in a new tab, scrape, go to previous page

I'm using puppeteer for the following:

I switched await link.click(".ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a",); for await new.page('...') but it says that it can't find the a.

This is the page that I'm scraping but notice the Load More button at the bottom of the page.

https://www.bodybuilding.com/exercises/finder

To prevent resetting the Load more button I want to open each new in a new tab, scrape, close tab and go to the next name.

How can I open each link in a new tab, close, and go to the previous tab?

My code:

var buttonExists = true;
let allData = [];
while (buttonExists == true) {
// const loadMore = true;
const rowsCounts = await page.$$eval(
    '.ExCategory-results > .ExResult-row',
    (rows) => rows.length
);
console.log(`row counts = ${rowsCounts}`);

for (let i = 2; i < rowsCounts + 1; i++) {
    const exerciseName = await page.$eval(
        `.ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a`,
        (el) => el.innerText
    );
    console.log(`Exercise = ${exerciseName}`);

    await link.click(`.ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a`,);
    await page.waitForSelector('#js-ex-content');

      ... fancy code here

    await page.goBack();

    let obj = {
        exercise: exerciseName,
    };

    allData.push(obj);

}
// clicking load more button and waiting 1sec
try {
    await page.click(LoadMoreButton);
}
catch (err) {
    buttonExists = false;
}
await page.waitForTimeout(1000);
}

Upvotes: 0

Views: 615

Answers (1)

pavelsaman
pavelsaman

Reputation: 8352

This selector: .ExCategory-results > .ExResult-row:nth-child(${i}) > .ExResult-cell > .ExHeading > a is unnecessarily long and it gives you not completely correct results.

To get to these elements:

enter image description here

this selector should be enough: .ExResult-row .ExHeading > a.

Then you asked:

I want to open each new in a new tab, scrape, close tab and go to the next name.

and

How can I open each link in a new tab, close, and go to the previous tab?

In Puppeteer you can create a new page like so: await browser.newPage();, so you can do it many times and store the pages into an array:

let pages = [];
pages.push(await browser.newPage());

then you get the links:

const links = await pages[0].$$eval(
    '.ExResult-row .ExHeading > a',
    links => links.map(l => l.getAttribute('href'))
);

and finally to create a new page for each link, scrape what you need, and close the page:

for (let link of links) {
    pages.push(await browser.newPage());
    await pages[pages.length - 1].goto(`${baseUrl}/${link}`);

    // your scraping

    await pages[pages.length - 1].close();
}

If you need to look up more, refer to the API documentation Puppeteer provides.

Upvotes: 1

Related Questions