Reputation: 63
I have a python function that scrapes a website's schedule and uploads it to a RDS database. The code works perfect on my local machine (pyppeteer Version: 2.0.0, Python 3.12). However I've been trying to port it over to AWS Lambda and my browser keeps on failing to launch.
I used THIS repository's chromium executable (extracted from the /bin of npm i chrome-aws-lambda@~2.0.2
and uploaded to an S3 bucket with appropriate permissions) which corresponded to the pyppeteer version I installed with my lambda function (pip3 install pyppeteer -t .
). The python code first downloads the chromium instance into the /tmp
directory and then attempts to launch the browser from it with pyppeteer. My lambda runtime is stuck on Python 3.9 because its the latest available version supported by "psycopg2._psycopg". Plus I don't think that's the issue.
Does anyone have any ideas as to why my browser fails to launch within my AWS Lambda runtime? I think it might be a problem with my arguments for the launch() function, but I'm unsure where to go from here.
Error Line:
browser = await launch(
headless=True,
args=[
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
"--single-process",
"--disable-dev-shm-usage",
"--no-zygote",
],
executablePath="/tmp/headless-chromium",
userDataDir="/tmp",
)
Error in Lambda Console:
Status: Failed
Test Event Name: testEvent
Response:
{
"errorMessage": "Browser closed unexpectedly:\n",
"errorType": "BrowserError",
"requestId": "a6a20222-6082-4618-b388-fbd4c88bda7d",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 369, in lambda_handler\n shows = asyncio.run(scrape_all_schedules())\n",
" File \"/var/lang/lib/python3.9/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n",
" File \"/var/lang/lib/python3.9/asyncio/base_events.py\", line 647, in run_until_complete\n return future.result()\n",
" File \"/var/task/lambda_function.py\", line 225, in scrape_all_schedules\n browser = await launch(\n",
" File \"/var/task/pyppeteer/launcher.py\", line 307, in launch\n return await Launcher(options, **kwargs).launch()\n",
" File \"/var/task/pyppeteer/launcher.py\", line 168, in launch\n self.browserWSEndpoint = get_ws_endpoint(self.url)\n",
" File \"/var/task/pyppeteer/launcher.py\", line 227, in get_ws_endpoint\n raise BrowserError('Browser closed unexpectedly:\\n')\n"
]
}
Function Logs:
Request ID: a6a20222-6082-4618-b388-fbd4c88bda7d
FULL CODE:
import asyncio
from datetime import datetime
from pyppeteer import launch
import os
import psycopg2
import boto3
async def scrape_all_schedules():
current_day = datetime.now()
download_chromium()
chromium_path = '/tmp/headless-chromium'
if os.path.exists(chromium_path):
print("Chromium binary found. Launching browser...")
else:
print(f"Error: Chromium binary not found at {chromium_path}")
browser = await launch(
headless=True,
args=[
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-gpu',
"--single-process",
"--disable-dev-shm-usage",
"--no-zygote",
],
executablePath="/tmp/headless-chromium",
userDataDir="/tmp",
)
page = await browser.newPage()
def lambda_handler(event, context):
print("Starting scraping process...")
shows = asyncio.run(scrape_all_schedules())
return {
'statusCode': 200,
'body': f"Successfully saved {len(shows)} shows to the database."
}
Upvotes: 1
Views: 124
Reputation: 63
From my understanding AWS Lambda runs "Amazon Linux 2023" and does not allow installation of the system level libraries needed to run headless pyppeteer (yum install -y libX11 libX11-devel libXcomposite libXcursor libXdamage libXrandr libXi libXtst libXScrnSaver
). I was able to get my script running on an E2C instance instead and would recommend others who face this problem to do the same.
Upvotes: 0