Reputation: 359
I'm using phantomJs to parse some content, get some info from it (max image size on page, for example), etc. I've decided to move to puppeteer. And i had faced the issue - in my functions, that was running at phantomJs, they were working with document node element. So, in puppeteer, as i understood, it's impossible to return node element from page.evaluate and other functions. So, is there any other way to overcome this problem? Or maybe i have to use another library? Thank you!
Upvotes: 12
Views: 14584
Reputation: 333
The answer from Grant Miller discusses some methods and gives documentation links, but doesn't have any code. Here is some demonstration code that shows:
page.$
method.classList.add
method in the context of a page.evaluate
method call.Code:
const parameters = {
"launchParameters": { "args": [] },
"gotoURI": "https://example.com",
"marginSpecification": {"top": "0", "right": "0", "bottom": "0", "left": "0"},
"pdfPath": "example.pdf",
"styleTag":
'body.orangey, body.orangey div {background-color: orange;}',
"addBodyClass": "orangey",
"footerTemplate": "<div></div>",
"headerTemplate": "<div></div>",
};
console.log("Node version: " + process.version);
const puppeteer = require("puppeteer");
(async () => {
console.log("await puppeteer.launch");
const browser = await puppeteer.launch(parameters.launchParameters);
console.log("await browser.newPage");
const page = await browser.newPage();
console.log("await page.goto");
await page.goto(parameters.gotoURI, {waitUntil: 'networkidle2'});
console.log("await page.addStyleTag");
await page.addStyleTag({
"content": parameters.styleTag
});
if (!!parameters.addBodyClass) {
console.log("await page dollar.")
const bodyHandle = await page.$('body');
console.log("Body handle", (!!bodyHandle) ? "OK." : "no.");
console.log(`await add class "${parameters.addBodyClass}"`);
await page.evaluate(
(body, addBodyClass) => body.classList.add(addBodyClass),
bodyHandle, parameters.addBodyClass)
.catch(error => console.log(error));
console.log("await body handle dispose.");
await bodyHandle.dispose();
}
const pdfOptions = {
path: parameters.pdfPath,
format: 'A4',
margin: parameters.marginSpecification,
displayHeaderFooter: true,
printBackground: true,
footerTemplate: parameters.footerTemplate,
headerTemplate: parameters.headerTemplate
};
console.log("await page.pdf");
await page.pdf(pdfOptions);
console.log("await browser.close");
await browser.close();
})();
Reference documentation for classList
can be found here, for example:
https://developer.mozilla.org/en-US/docs/Web/API/Element/classList
Upvotes: 4
Reputation: 29037
There are two environments to consider when using Puppeteer:
The Node.js environment is built upon Google's Chrome V8 JavaScript engine.
Chrome V8 describes its relation to the DOM:
JavaScript is most commonly used for client-side scripting in a browser, being used to manipulate Document Object Model (DOM) objects for example. The DOM is not, however, typically provided by the JavaScript engine but instead by a browser. The same is true of V8—Google Chrome provides the DOM. V8 does however provide all the data types, operators, objects and functions specified in the ECMA standard.
In other words, the DOM is not provided by default to Node.js.
This means that Node.js does not have the capability to interpret DOM elements on its own.
This is where Puppeteer comes in.
The Puppeteer function page.evaluate()
allows you to evaluate an expression in the current Page DOM context using Chrome or Chromium.
The Puppeteer documentation describes what happens when you attempt to return a non-serializable value, like a DOM element:
If the function passed to the
page.evaluate
returns a non-Serializable value, thenpage.evaluate
resolves toundefined
.
Again, this is because Node.js does not know how to interpret DOM elements without help.
As a result, Puppeteer has implemented an ElementHandle
class which represents an in-page DOM element.
You can use elementHandle.$()
, elementHandle.$$()
, or elementHandle.$x()
to return ElementHandle
s back to Node.js.
The ElementHandle
class is serializable, so that it can be interpreted properly in the Node.js environment.
Therefore, if you need to manipulate an element directly, you can do so inside page.evaluate()
. If you need to access a representation of an element, use page.$()
or one of its related functions.
Upvotes: 9