puppeteer crawler - pagination by clicking on "next" button

Question

I'm facing the following problem with my Puppeteer crawler: The site I'm scraping has results pages and we can navigate to the next page by clicking on an arrow at the bottom page (there is no easy href attached to the link, so we need to simulate a click on the button). On each page, I need to scrape all the items details (real estate cards / 30 cards by page).

The question is: how to navigate to all following pages, and scrape all cards on each page?

What I've done: on start url, I fill in a form to submit and getting the first 30 results to my request. Then, I loop on the selector matching the arrow at the page bottom and click on it, until the selector is not there. The navigation works, but the scraper doesn't get all the links for the cards on each page. So there is only the 30 first cards scraped, and then the scraper stop.

async function pageFunction(context) {

    switch (context.request.userData.label) {
        case 'START': return handleStart(context);
        case 'DETAIL': return handleDetail(context);
    }

    async function handleStart({ log, page, customData }) {
        // fill in form and submit to get the results page
        await page.click(home.submitSearch);
 
        // waiting for some selectors on first results page 
        await page.waitForSelector(searchResults.card);
        await page.waitForSelector(searchResults.blockNavigation);

        // navigate with pagination
        while (await page.$(searchResults.nextPage) !== null) {
           await page.waitForSelector(searchResults.card);
           await page.waitForSelector(searchResults.blockNavigation);
           await page.click(searchResults.nextPage)
        }
    }

    async function handleDetail({ request, log, skipLinks, page }) {
        const description = await page.$eval(descriptionSelector, (el => el.textContent));
        return { description };
    }
}

The 'START' label matches the start url with the form.

The 'DETAIL' label matches the links related to one card on the results page.

Any idea on how to handle this case?

puppeteer crawler - pagination by clicking on "next" button

Answers (1)

Related Questions

puppeteer crawler - pagination by clicking on &quot;next&quot; button

Answers (1)

Related Questions

puppeteer crawler - pagination by clicking on "next" button