Elia Weiss
Elia Weiss

Reputation: 9866

Puppeteer doesn't close browser

I'm running puppeteer on express/node/ubuntu as follow:

var puppeteer = require('puppeteer');
var express = require('express');
var router = express.Router();

/* GET home page. */
router.get('/', function(req, res, next) {
    (async () => {
        headless = true;
        const browser = await puppeteer.launch({headless: true, args:['--no-sandbox']});
        const page = await browser.newPage();
        url = req.query.url;
        await page.goto(url);
        let bodyHTML = await page.evaluate(() => document.body.innerHTML);
        res.send(bodyHTML)
        await browser.close();
    })();
});

running this script multiple times leaves hundred of Zombies:

$ pgrep chrome | wc -l
133

Which clogs the srv,

How do I fix this?

Running kill from a Express JS script could solve it?

Is there a better way to get the same result other than puppeteer and headless chrome?

Upvotes: 27

Views: 84659

Answers (12)

Ramaraja
Ramaraja

Reputation: 2626

This is a simple oversight: what if an error occurs and your await browser.close() never executes thus leaving you with zombies.

Rather than using shell.js, the better practice is to use try..catch..finally. The reason being you would want the browser to be closed irrespective of a happy flow or an error being thrown. And unlike the other code snippet, you don't have to try and close the browser in the both the catch block and finally block. finally block is always executed irrespective of whether an error is thrown or not.

So, your code should look like,

const puppeteer = require('puppeteer');
const express = require('express');

const router = express.Router();

/* GET home page. */
router.get('/', function(req, res, next) {
  (async () => {
    const browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox'],
    });

    try {
      const page = await browser.newPage();
      const url = req.query.url;
      await page.goto(url);
      const bodyHTML = await page.evaluate(() => document.body.innerHTML);
      res.send(bodyHTML);
    } catch (e) {
      console.log(e);
    } finally {
      await browser.close();
    }
  })();
});

Hope this helps!

Upvotes: 31

ggorlen
ggorlen

Reputation: 56865

I use the following basic setup for running Puppeteer:

const puppeteer = require("puppeteer");

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();

  /* use the page */
  
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

Here, the finally block guarantees the browser will close correctly regardless of whether an error was thrown. Errors are logged (if desired). I like .catch and .finally as chained calls because the mainline Puppeteer code is one level flatter, but this accomplishes the same thing:

const puppeteer = require("puppeteer");

(async () => {
  let browser;

  try {
    browser = await puppeteer.launch();
    const [page] = await browser.pages();

    /* use the page */
  }
  catch (err) {
    console.error(err);
  }
  finally {
    await browser?.close();
  }
})();

There's no reason to call newPage because Puppeteer starts with a page open.


As for Express, you need only place the entire code above, including let browser; and excluding require("puppeteer"), into your route, and you're good to go, although you might want to use an async middleware error handler.

You ask:

Is there a better way to get the same result other than puppeteer and headless chrome?

That depends on what you're doing and what you mean by "better". If your goal is to get document.body.innerHTML and the page content you're interested in is baked into the static HTML, you can dump Puppeteer entirely and just make a request to get the resource, then use Cheerio to extract the desired information.

Another consideration is that you may not need to load and close a whole browser per request. If you can use one new page per request, consider the following strategy:

const express = require("express");
const puppeteer = require("puppeteer");

const asyncHandler = fn => (req, res, next) =>
  Promise.resolve(fn(req, res, next)).catch(next);

const browserReady = puppeteer.launch({
  args: ["--no-sandbox", "--disable-setuid-sandbox"]
});

const app = express();
app
  .set("port", process.env.PORT || 5000)
  .get("/", asyncHandler(async (req, res) => {
    const browser = await browserReady;
    const page = await browser.newPage();

    try {
      await page.goto(req.query.url || "http://www.example.com");
      return res.send(await page.content());
    }
    catch (err) {
      return res.status(400).send(err.message);
    }
    finally {
      await page.close();
    }
  }))
  .use((err, req, res, next) => res.sendStatus(500))
  .listen(app.get("port"), () =>
    console.log("listening on port", app.get("port"))
  );

Finally, make sure to never set any timeouts to 0 (for example, page.setDefaultNavigationTimeout(0);), which introduces the potential for the script to hang forever. If you need a generous timeout, at most set it to a few minutes--long enough not to trigger false positives.

See also:

Upvotes: 4

Iqbal
Iqbal

Reputation: 2304

I run puppeteer inside docker (with docker-compose). What worked for me to reap the zombie processes is adding init: true in the docker-compose.yml file in the service where puppeteer was run.

services:
  web:
    image: alpine:latest
    init: true

References

  1. https://pptr.dev/troubleshooting#running-puppeteer-in-docker
  2. https://stackoverflow.com/a/64394969/3775598
  3. https://docs.docker.com/compose/compose-file/compose-file-v2/#init

Upvotes: 0

araghorn
araghorn

Reputation: 61

I encountered this issue using chromium browser (@sparticuz/chromium). Following the issue forum, closing all the pages made a change. It looks like there is some extra page or tab opened by chromium and you need to ensure to close them all.

const pages = await browser.pages();
await Promise.all(pages.map((page) => page.close()));
await browser.close();

Upvotes: 2

bubjavier
bubjavier

Reputation: 1012

the try-catch-finally approach did not work for me and going with shelljs' shell.exec('pkill chrome') feels a desperate move.

in my case, the problem was, I had redis' await cache.set('key', 'value') somewhere on my code, that needs to be closed first, so I have to call await cache.quit() before await browser.close(). this solved my issue.

I suggest you need to check for the libs or modules you used, that requires closing/quiting first, specially those that are continuously running and not throwing any errors, on which try-catch wont help, thus preventing browser to be closed.

Upvotes: 0

Michał Tomczuk
Michał Tomczuk

Reputation: 611

I've ran into this issue today myself and I've found a solution. It seems that the issue with Chromium not closing is due to not closed pages. Close all the pages before calling browser.close() and everything should be fine:

const pages = await browser.pages();
for (let i = 0; i < pages.length; i++) {
    await pages[i].close();
}
await browser.close()

Hopefully that helps someone!

Upvotes: 10

use

 (await browser).close()

that happens because what the browser contains is a promise you have to solve it, I suffered a lot for this I hope it helps

Upvotes: 0

Mukesh
Mukesh

Reputation: 1267

wrap your code in try-catch like this and see if it helps

headless = true;
const browser = await puppeteer.launch({headless: true, args:['--no-sandbox']});
try {
  const page = await browser.newPage();
  url = req.query.url;
  await page.goto(url);
  let bodyHTML = await page.evaluate(() => document.body.innerHTML);
  res.send(bodyHTML);
  await browser.close();
} catch (error) {
  console.log(error);
} finally {
  await browser.close();
}

Upvotes: 22

Tim Kozak
Tim Kozak

Reputation: 4182

From my experience, the browser closing process may take some time after close is called. Anyway, you can check the browser process property to check if it's still not closed and force kill it.

if (browser && browser.process() != null) browser.process().kill('SIGINT');

I'm also posting the full code of my puppeteer resources manager below. Take a look at bw.on('disconnected', async () => {

const puppeteer = require('puppeteer-extra')
const randomUseragent = require('random-useragent');
const StealthPlugin = require('puppeteer-extra-plugin-stealth')

const USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36';
puppeteer.use(StealthPlugin())

function ResourceManager(loadImages) {
    let browser = null;
    const _this = this;
    let retries = 0;
    let isReleased = false;

    this.init = async () => {
        isReleased = false;
        retries = 0;
        browser = await runBrowser();
    };

    this.release = async () => {
        isReleased = true;
        if (browser) await browser.close();
    }

    this.createPage = async (url) => {
        if (!browser) browser = await runBrowser();
        return await createPage(browser,url);
    }

    async function runBrowser () {
        const bw = await puppeteer.launch({
            headless: true,
            devtools: false,
            ignoreHTTPSErrors: true,
            slowMo: 0,
            args: ['--disable-gpu','--no-sandbox','--no-zygote','--disable-setuid-sandbox','--disable-accelerated-2d-canvas','--disable-dev-shm-usage', "--proxy-server='direct://'", "--proxy-bypass-list=*"]
        });

        bw.on('disconnected', async () => {
            if (isReleased) return;
            console.log("BROWSER CRASH");
            if (retries <= 3) {
                retries += 1;
                if (browser && browser.process() != null) browser.process().kill('SIGINT');
                await _this.init();
            } else {
                throw "===================== BROWSER crashed more than 3 times";
            }
        });

        return bw;
    }

    async function createPage (browser,url) {
        const userAgent = randomUseragent.getRandom();
        const UA = userAgent || USER_AGENT;
        const page = await browser.newPage();
        await page.setViewport({
            width: 1920 + Math.floor(Math.random() * 100),
            height: 3000 + Math.floor(Math.random() * 100),
            deviceScaleFactor: 1,
            hasTouch: false,
            isLandscape: false,
            isMobile: false,
        });
        await page.setUserAgent(UA);
        await page.setJavaScriptEnabled(true);
        await page.setDefaultNavigationTimeout(0);
        if (!loadImages) {
            await page.setRequestInterception(true);
            page.on('request', (req) => {
                if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
                    req.abort();
                } else {
                    req.continue();
                }
            });
        }

        await page.evaluateOnNewDocument(() => {
            //pass webdriver check
            Object.defineProperty(navigator, 'webdriver', {
                get: () => false,
            });
        });

        await page.evaluateOnNewDocument(() => {
            //pass chrome check
            window.chrome = {
                runtime: {},
                // etc.
            };
        });

        await page.evaluateOnNewDocument(() => {
            //pass plugins check
            const originalQuery = window.navigator.permissions.query;
            return window.navigator.permissions.query = (parameters) => (
                parameters.name === 'notifications' ?
                    Promise.resolve({ state: Notification.permission }) :
                    originalQuery(parameters)
            );
        });

        await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'plugins', {
                // This just needs to have `length > 0` for the current test,
                // but we could mock the plugins too if necessary.
                get: () => [1, 2, 3, 4, 5],
            });
        });

        await page.evaluateOnNewDocument(() => {
            // Overwrite the `plugins` property to use a custom getter.
            Object.defineProperty(navigator, 'languages', {
                get: () => ['en-US', 'en'],
            });
        });

        await page.goto(url, { waitUntil: 'networkidle2',timeout: 0 } );
        return page;
    }
}

module.exports = {ResourceManager}

Upvotes: 13

voidmind
voidmind

Reputation: 156

I ran into the same issue and while your shelljs solution did work, it kills all chrome processes, which might interrupt one that is still processing a request. Here is a better solution that should work.

var puppeteer = require('puppeteer');
var express = require('express');
var router = express.Router();

router.get('/', function (req, res, next) {
    (async () => {
        await puppeteer.launch({ headless: true }).then(async browser => {
            const page = await browser.newPage();
            url = req.query.url;
            await page.goto(url);
            let bodyHTML = await page.evaluate(() => document.body.innerHTML);
            await browser.close();
            res.send(bodyHTML);
        });
    })();
});

Upvotes: 0

mayank Chandel
mayank Chandel

Reputation: 58

try to close the browser before sending the response

var puppeteer = require('puppeteer');
var express = require('express');
var router = express.Router();

router.get('/', function(req, res, next) {
    (async () => {
        headless = true;
        const browser = await puppeteer.launch({headless: true});
        const page = await browser.newPage();
        url = req.query.url;
        await page.goto(url);
        let bodyHTML = await page.evaluate(() => document.body.innerHTML);
        await browser.close();
        res.send(bodyHTML);
    })();
});

Upvotes: 0

Elia Weiss
Elia Weiss

Reputation: 9866

I solve it with https://www.npmjs.com/package/shelljs

var shell = require('shelljs');
shell.exec('pkill chrome')

Upvotes: 6

Related Questions