Reputation: 3
I want to make python crawler and it works in local. But I want to run crawler in regularly so I put my crawler in aws lambda.
I downloaded chromedriver and put somewhere in my directory and I can use it in local server. But in lambda, I don't know how to set the path and upload chromedriver file.
I tried absolute path but it didn't work. Should I upload the chromedriver for lambda function? If so, how can I do?
my code in localhost
chrome_driver_path = "../chromedriver_win32/chromedriver.exe"
driver = webdriver.Chrome(
executable_path = chrome_driver_path,
chrome_options=options
)
URL = "https://*****.co.kr"
driver.get(URL)
Upvotes: 0
Views: 972
Reputation: 4486
So you can't run .exe on lambda because it's Linux but you CAN run puppeteer as above and use that to parse the HTML.
Install puppeteer as above(npm i puppeteer --save). Bundle everything up.(You compress all your code and node_modules into a zip file) Deploy to AWS. Voila.
I HIGHLY recommend the Serverless framework as it takes the pain out of deployments, you can get it here
Do bear in mind if your crawling job is going to take more than 15 minutes you'll need to schedule this via cron on something like a t2.micro rather than Lambda because it'll timeout.
Upvotes: 1
Reputation: 7404
Lambda cannot run executable files. Have you considered headless chrome?
https://github.com/GoogleChrome/puppeteer
https://pptr.dev/#?product=Puppeteer&version=v1.20.0&show=outline
Upvotes: 0