PostAlmostAnything
PostAlmostAnything

Reputation: 145

How Do I Run page.screenshot Inside page.evaluate in Puppeteer?

I am creating a screen scraper that needs to scrape the content of a page and take a screenshot of it. For that I am using Puppeteer, but I am hitting a snag. When I try to call a function that runs page.screenshot inside of page.evaulate I am getting an error that the funtion is not defined.

Here is my code:

async function getContent(clink, ce, networkidle, host, filepath) {
        let browser = await puppeteer.launch();
        let cpage = await browser.newPage();
        await cpage.goto(clink, { waitUntil: networkidle });
        let content = await cpage.evaluate((clink, ce, networkidle, host, filepath, pubDate) => {
            let results = '';
            let enclurl = clink;
            takeScreenshot(enclurl, filepath, networkidle)
                .then(() => {
                    console.log("Screenshot taken");
                })
                .catch((err) => {
                    console.log("Error occured!");
                    console.dir(err);
                });
            results += '<title><![CDATA[' + 'test' + ']]</title>';
            results += '<description><![CDATA[' + '<img src="' + host + filepath.slice(1) + '">' + document.querySelector(ce).innerHTML + ']]</description>';
            results += '<link>' + clink + '</link>';
            results += '<guid>' + clink + '</guid>';
            results += '<pubDate>' + pubDate + '</pubDate>';
            return results;
        }, clink, ce, networkidle, host, filepath, pubDate);
        await cpage.close();
        await browser.close();
        return content;
    }

That code should return items before a RSS format xml file is created. The URLs of such files will then be added to WPRobot campaigns. The end goal will be a search engine the uses Wordpress to aggregate the main content of pages with full screenshots of the sources.

The takeScreenshot function is as follows:

async function takeScreenshot(enclurl, filepath, networkidle) {
        let browser = await puppeteer.launch();
        let page = await browser.newPage();
        await page.goto(enclurl, { waitUntil: networkidle });
        let buffer = await page.screenshot({
            path: filepath
        });

        await page.close();
        await browser.close();
    }

Take screenshot works just fine when called outside of page.evaluate. The exact error I get says "takeScreenshot is undefined." I have another function that parses RSS feeds and takes screenshots of their source URLs, but it does not use page.evaluate at all.

I have now added the call to takeScreenshot to an earlier part of my code right before getContent() called but now it seems getContent() always returns as undefined. My new getContent() reads:

 async function getContent(clink, ce, networkidle) {
        let browser = await puppeteer.launch();
        let cpage = await browser.newPage();
        await cpage.goto(clink, { waitUntil: networkidle });
        let content = await cpage.evaluate((ce) => {
            let cefc = ce.charAt(0);
            if (cefc != '.') {
                ce = '#' + ce;
            }
            console.log('ce=' + ce);
            let results = document.querySelector(ce).innerHTML;
            return results;
        }, ce);
        await cpage.close();
        await browser.close();
        return content;
    }

I am also not seeing console.log('ce=' + ce) being written to the log. After moving the console.log out of the page.evaluate loop it logged the appropriate value for the content which is the HTML of the element with the specified class. Despite that the value of return content remains undefined.

Upvotes: 0

Views: 1302

Answers (1)

Massimo Rebuglio
Massimo Rebuglio

Reputation: 341

Page.evaluate has a strange and not intuitive way to work:

the code of the function ( in you case: (clink, ce, networkidle, host, filepath, pubDate) => {...} ) is NOT executed in your script. This function in serialized, and send to the headless browser, inside puppeteer.

If you want to call a function from inside the evaluate function, usually (but not in this case) you can use one of this tricks: How to pass a function in Puppeteers .evaluate() method?

BUT in this case... there is a problem! inside takeScreenshot there are other function that CAN'T BE inside the headless browser of puppeteer, that are puppeteer.launch(); etc. This functions require a lot of dependecies (and same executable)... and can't be passed.

To do what you need, move the screenshot part of your code out of evaluate:

async function getContent(clink, ce, networkidle, host, filepath) {
    let browser = await puppeteer.launch();
    let cpage = await browser.newPage();
    await cpage.goto(clink, { waitUntil: networkidle });
    let content = await cpage.evaluate((clink, ce, networkidle, host, filepath, pubDate) => {
        let results = '';
        let enclurl = clink;

        results += '<title><![CDATA[' + 'test' + ']]</title>';
        results += '<description><![CDATA[' + '<img src="' + host + '{REPL_ME}' + '">' + document.querySelector(ce).innerHTML + ']]</description>';
        results += '<link>' + clink + '</link>';
        results += '<guid>' + clink + '</guid>';
        results += '<pubDate>' + pubDate + '</pubDate>';
        return results;
    }, clink, ce, networkidle, host, filepath, pubDate);

    await takeScreenshot(enclurl, filepath, networkidle);
    content = content.replace('{REPL_ME}', filepath)   

    await cpage.close();
    await browser.close();
    return content;
}

Upvotes: 2

Related Questions