nanquim
nanquim

Reputation: 1924

Node.js/Puppeteer - DOM NodeList to JS Object

I want to transform a NodeList into an object.

H1 is object.name and so on.

I still can't understand the exact behavior of page.evaluate().

This is what I need:

enter image description here

And this is one of my attempts, but gp is always undefined:

await page.waitForNavigation();

const selG = 'body > div.content-home > div > div.box > div > div:nth- child(2) > div.col-md-12.no-padding > div:nth-child(4) > div:nth-child(2) > div.col-xs-12';
await page.waitForSelector(selG);
const g = await page.evaluate( (selG) => {
    let gp = document.querySelector(selG); //null
    let n = Array.from(gp.querySelectorAll('h1'), element => element.textContent);
    console.log(n[0]);
    return n;
});

Upvotes: 1

Views: 1427

Answers (2)

NoriSte
NoriSte

Reputation: 3709

page.evaluate() runs the function you're passing it directly into the browser and it hasn't the scope (access to variables) of the NodeJS script that launched Puppetter.

To fully understand, try this:

1 - copy your function as is

2 - wraps it into a self-invoking function ([your-function])(), the result is the following (I added one more console.log(selG); line)

((selG) => {
  console.log(selG); // I added this line
  let gp = document.querySelector(selG);
  let n = Array.from(gp.querySelectorAll('h1'), element => element.textContent);
  console.log(n[0]);
  return n;
})()

3 - paste it directly into the devtools console

Doing so you're doing more less (from an understanding perspective) what page.evaluate() does, that is running the function you're passing it directly into the browser. What's the result? It is Cannot read property 'querySelectorAll' of null because, as you noted, gp is null.

But concentrate on the console.log(selG); I added... it logs undefined... that's the big issue!

Why does it happen?

Take a look at the function itself, the selG variable doesn't exist so the let gp = document.querySelector(selG); can't return anything. selG is defined into the script you used to launch Puppeteer but the function you pass to page.evaluate() will be run in the browser, not in the Node execution context.

Quoting directly the Puppeteer docs

page.evaluate(pageFunction, ...args)

pageFunction Function to be evaluated in the page context

...args <...Serializable|JSHandle> Arguments to pass to pageFunction

use (as told by Grant) the second rest args to pass the selG variable to your function.

Following your original code with a little change

await page.waitForNavigation();

const selG = 'body > div.content-home > div > div.box > div > div:nth- child(2) > div.col-md-12.no-padding > div:nth-child(4) > div:nth-child(2) > div.col-xs-12';
await page.waitForSelector(selG);
const g = await page.evaluate( (SELECTOR) => {
    let gp = document.querySelector(SELECTOR);
    let n = Array.from(gp.querySelectorAll('h1'), element => element.textContent);
    console.log(n[0]);
    return n;
}, selG);

Please note:

  • that I pass the selG variable (last line) to pageFunction (your function)

  • the pageFunction receives a variable and stores it into the SELECTOR variable

  • the pageFunction than consumes the SELECTOR received

To summarize: the function passed to page.evaluate() CAN'T consumes the variables declared outside it because it will be run into the browser, a context separated from your NodeJS script (wrote to launch Puppeteer itself).

Try my code, it should work without any change. Let me know if it's enough clear.

BONUS

Remember that if you want to consume some DOM-related data you have at least three different methods that do the same.

Below you find an example of mine where I want to read the href attribute of the first link I find in a page. The first example uses page.evaluate() as you did, the latter two examples show you a different approach using some other Puppeteer APIs.

const SELECTOR = '[href]:not([href=""])';
let link;

// compare the three following examples, they all do the same
link = await page.evaluate((sel) => 
    document.querySelector(sel).getAttribute('href')
, SELECTOR);
link = await page.$eval(SELECTOR, el => el.getAttribute('href'));
link = await page.$(SELECTOR).getProperty('href').jsonValue();

Upvotes: 3

Grant Miller
Grant Miller

Reputation: 29037

You must pass the the variable selG to page.evaluate() using the following method:

const g = await page.evaluate(selG => { /* ... */ }, selG);

Note: Notice the that I added selG as a separate argument after the page function.

page.evaluate(pageFunction, ...args)

This should prevent document.querySelector(selG) from returning null.

Upvotes: 1

Related Questions