Reputation: 143
I'm using Puppeteer for webscraping, with a small NodeJs webapp that I made. This webapp is hosted on Heroku and use jontewks/puppeteer-heroku-buildpack
to works.
The problem I'm facing is that my app do not build anymore because of the Heroku size limit:
Compiled slug size: 537.4M is too large (max is 500M).
I've tried severals things:
interactive_ui_tests.exe
headless_shell
instead of Chromium
puppeteer-extra
and puppeteer-extra-plugin-stealth
, so it bother me to changelocales
2.1.1
), which is using an older version Chromium who was slighlty lighter
heroku repo:gc -a myapp
and heroku builds:cache:purge -a myapp
My last three points reduced the size of my slug to 490M
. So my app is working, but it's not great for the (close) future, like having an up to date Puppeteer version.
So here I am, asking for help, as I do not have any more ideas at the moment.
Thank you very much for your help 🙏
Upvotes: 5
Views: 3481
Reputation: 143
Finally, I end up using Playwright.
With this Buildpack, the build of my app is only 250Mb!
Here's a few steps I've followed:
Install with NPM playwright-chromium
to only download Chromium.
Set PLAYWRIGHT_BUILDPACK_BROWSERS
env variable to chromium
in Heroku to only install Chromium dependencies.
Put this buildpack before Node.js buildpack in Heroku.
With this trick you can use most of the of stuff from puppeteer-stealth
.
If you want, you can block resources like in Puppeteer:
await page.route('**/*', route => ([
'stylesheet',
'image',
'media',
'font',
// 'script',
'texttrack',
'xhr',
'fetch',
'eventsource',
'websocket',
'manifest',
'other',
].includes(route.request().resourceType()) ? route.abort() : route.continue()))
Upvotes: 3