Reputation: 12335
I am trying to extract a few urls
from this page with Puppeteer.
However all my script is returning is undefined
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
await page.goto('https://divisare.com/');
let projects = await page.evaluate((sel) => {
return document.getElementsByClassName(sel)
}, 'homepage-project-image');
var aNode = projects[0].href;
console.log(aNode);
console.log(projects.length)
browser.close();
}
run();
However when I run something like the below I am at least able to get the proper count of the links I am trying to extract.
let projects = await page.evaluate((sel) => {
return document.getElementsByClassName(sel).length
}, 'homepage-project-image');
console.log(projects);
Am I trying to access my projects
HTMLCollection
incorrectly? What am I missing here? Thanks.
Upvotes: 4
Views: 5052
Reputation: 6713
Puppeteer cannot return non-serialisable value from evaluate
statement (see this issue and the following PR)
One way to solve this would be:
let projects = await page.evaluate((sel) => {
return document.getElementsByClassName(sel)[0].href;
}, 'homepage-project-image');
Remember that document.getElementsByClassName
returns HTMLCollection
, so if you want to iterate over the results you need something like:
let projects = await page.evaluate((sel) => {
return Array.from(document.getElementsByClassName(sel)).map(node => node.href);
}, 'homepage-project-image');
Upvotes: 4