Kasper Hansen
Kasper Hansen

Reputation: 6557

Cannot get querySelectorAll to work with puppeteer (returns undefined)

I'm trying to practice some web scraping with prices from a supermarket. It's with node.js and puppeteer. I can navigate throught the website in beginning with accepting cookies and clicking a "load more button". But then when I try to read div's containing the products with querySelectorAll I get stuck. It returns undefined even though I wait for a specific div to be present. What am I missing?

Problem is at the end of the code block.

const { product } = require("puppeteer");

const scraperObjectAll = {
    url: 'https://www.bilkatogo.dk/s/?query=',
    async scraper(browser) {
        let page = await browser.newPage();
        console.log(`Navigating to ${this.url}`);
        await page.goto(this.url);

        // accept cookies
        await page.evaluate(_ => {
            CookieInformation.submitAllCategories();
        });

        var productsRead = 0;
        var productsTotal = Number.MAX_VALUE;

        while (productsRead < 100) {
            // Wait for the required DOM to be rendered
            await page.waitForSelector('button.btn.btn-dark.border-radius.my-3');
            // Click button to read more products
            await page.evaluate(_ => {
                document.querySelector("button.btn.btn-dark.border-radius.my-3").click()
            });
            // Wait for it to load the new products
            await page.waitForSelector('div.col-10.col-sm-4.col-lg-2.text-center.mt-4.text-secondary');
            // Get number of products read and total
            const loadProducts = await page.evaluate(_ => {
                let p = document.querySelector("div.col-10.col-sm-4.col-lg-2").innerText.replace("INDLÆS FLERE", "").replace("Du har set ","").replace(" ", "").replace(/(\r\n|\n|\r)/gm,"").split("af ");
                return p;
            });

            console.log("Products (read/total): " + loadProducts);
            productsRead = loadProducts[0];
            productsTotal = loadProducts[1];

            // Now waiting for a div element
            await page.waitForSelector('div[data-productid]');

            const getProducts = await page.evaluate(_ => {
                return document.querySelectorAll('div');
            });

            // PROBLEM HERE!
            // Cannot convert undefined or null to object
            console.log("LENGTH: " + Array.from(getProducts).length);
        }

Upvotes: 3

Views: 4294

Answers (1)

CertainPerformance
CertainPerformance

Reputation: 370699

The callback passed to page.evaluate runs in the emulated page context, not in the standard scope of the Node script. Expressions can't be passed between the page and the Node script without careful considerations: most importantly, if something isn't serializable (converted into plain JSON), it can't be transferred.

querySelectorAll returns a NodeList, and NodeLists only exist on the front-end, not the backend. Similarly, NodeLists contain HTMLElements, which also only exist on the front-end.

Put all the logic that requires using the data that exists only on the front-end inside the .evaluate callback, for example:

const numberOfDivs = await page.evaluate(_ => {
  return document.querySelectorAll('div').length;
});

or

const firstDivText = await page.evaluate(_ => {
  return document.querySelector('div').textContent;
});

Upvotes: 6

Related Questions