Reputation: 1924
I want to transform a NodeList into an object.
H1 is object.name
and so on.
I still can't understand the exact behavior of page.evaluate()
.
This is what I need:
And this is one of my attempts, but gp
is always undefined:
await page.waitForNavigation();
const selG = 'body > div.content-home > div > div.box > div > div:nth- child(2) > div.col-md-12.no-padding > div:nth-child(4) > div:nth-child(2) > div.col-xs-12';
await page.waitForSelector(selG);
const g = await page.evaluate( (selG) => {
let gp = document.querySelector(selG); //null
let n = Array.from(gp.querySelectorAll('h1'), element => element.textContent);
console.log(n[0]);
return n;
});
Upvotes: 1
Views: 1427
Reputation: 3709
page.evaluate()
runs the function you're passing it directly into the browser and it hasn't the scope (access to variables) of the NodeJS script that launched Puppetter.
To fully understand, try this:
1 - copy your function as is
2 - wraps it into a self-invoking function ([your-function])()
, the result is the following (I added one more console.log(selG);
line)
((selG) => {
console.log(selG); // I added this line
let gp = document.querySelector(selG);
let n = Array.from(gp.querySelectorAll('h1'), element => element.textContent);
console.log(n[0]);
return n;
})()
3 - paste it directly into the devtools console
Doing so you're doing more less (from an understanding perspective) what page.evaluate()
does, that is running the function you're passing it directly into the browser.
What's the result? It is Cannot read property 'querySelectorAll' of null
because, as you noted, gp
is null.
But concentrate on the console.log(selG);
I added... it logs undefined
... that's the big issue!
Why does it happen?
Take a look at the function itself, the selG
variable doesn't exist so the let gp = document.querySelector(selG);
can't return anything. selG
is defined into the script you used to launch Puppeteer but the function you pass to page.evaluate()
will be run in the browser, not in the Node execution context.
Quoting directly the Puppeteer docs
page.evaluate(pageFunction, ...args)
pageFunction Function to be evaluated in the page context
...args <...Serializable|JSHandle> Arguments to pass to pageFunction
use (as told by Grant) the second rest args
to pass the selG
variable to your function.
Following your original code with a little change
await page.waitForNavigation();
const selG = 'body > div.content-home > div > div.box > div > div:nth- child(2) > div.col-md-12.no-padding > div:nth-child(4) > div:nth-child(2) > div.col-xs-12';
await page.waitForSelector(selG);
const g = await page.evaluate( (SELECTOR) => {
let gp = document.querySelector(SELECTOR);
let n = Array.from(gp.querySelectorAll('h1'), element => element.textContent);
console.log(n[0]);
return n;
}, selG);
Please note:
that I pass the selG
variable (last line) to pageFunction
(your function)
the pageFunction
receives a variable and stores it into the SELECTOR
variable
the pageFunction
than consumes the SELECTOR
received
To summarize: the function passed to page.evaluate()
CAN'T consumes the variables declared outside it because it will be run into the browser, a context separated from your NodeJS script (wrote to launch Puppeteer itself).
Try my code, it should work without any change. Let me know if it's enough clear.
BONUS
Remember that if you want to consume some DOM-related data you have at least three different methods that do the same.
Below you find an example of mine where I want to read the href
attribute of the first link I find in a page. The first example uses page.evaluate()
as you did, the latter two examples show you a different approach using some other Puppeteer APIs.
const SELECTOR = '[href]:not([href=""])';
let link;
// compare the three following examples, they all do the same
link = await page.evaluate((sel) =>
document.querySelector(sel).getAttribute('href')
, SELECTOR);
link = await page.$eval(SELECTOR, el => el.getAttribute('href'));
link = await page.$(SELECTOR).getProperty('href').jsonValue();
Upvotes: 3
Reputation: 29037
You must pass the the variable selG
to page.evaluate()
using the following method:
const g = await page.evaluate(selG => { /* ... */ }, selG);
Note: Notice the that I added
selG
as a separate argument after the page function.page.evaluate(pageFunction, ...args)
This should prevent document.querySelector(selG)
from returning null
.
Upvotes: 1