Reputation:
Lambda handler
code:
from pdf2image import convert_from_path, convert_from_bytes
def lambda_handler(event, context):
# TODO implement
f = "967.pdf"
images = convert_from_path(f,dpi=150)
return {
'statusCode': 200,
'body': images
}
I am getting the error -
{ "errorMessage": "Unable to get page count. Is poppler installed and in PATH?", "errorType": "PDFInfoNotInstalledError", "stackTrace": [ " File \"/var/task/lambda_function.py\", line 15, in lambda_handler\n images = convert_from_path(f,dpi=150,poppler_path=poppler_path)\n", " File \"/opt/python/pdf2image/pdf2image.py\", line 80, in convert_from_path\n page_count = _page_count(pdf_path, userpw, poppler_path=poppler_path)\n", " File \"/opt/python/pdf2image/pdf2image.py\", line 355, in _page_count\n \"Unable to get page count. Is poppler installed and in PATH?\"\n" ] }
Upvotes: 2
Views: 4247
Reputation: 41
In case someone comes around this issue: I would also consider using pymupdf for converting pdf to png or jpeg - you can pip install it and include it in your deployment package for aws lambda - no system level dependencies needed. Then, you can do something like this:
import fitz
#data = <Read your pdf file>
document: fitz.Document = fitz.open(stream=io.BytesIO(data), filetype="pdf")
images = []
#Iterate over pages and convert each to the desired format - here PNG
for page in document:
pix = page.get_pixmap(alpha=False, dpi=200)
image_bytes = pix.tobytes(output="png")
buffer = io.BytesIO(image_bytes)
buffer.seek(0)
images.append(buffer)
Also, if you struggle to get down the size of your deployment package, you can remove the legacy directory "fitz_old" after installation. This includes legacy code that is not needed if you use the latest version and cuts the size of you you deployment package by ~27 MB.
Upvotes: 0
Reputation: 1506
Poppler is not installed on Lambda, you have to package it during your deployment. Since this is something that gets brought up a lot, I made a repository for the procedure:
https://github.com/Belval/pdf2image-as-a-service
If for some reason you do not want to use the above here are the general steps to building and including poppler in your package:
poppler_path
Again, you can also just read the script in as-a-function/amazon/lambda.sh
Upvotes: 2