user12366250
user12366250

Reputation:

Convert pdf pages into images using Pdf2img in python on AWS Lambda

Lambda handler code:

from pdf2image import convert_from_path, convert_from_bytes


def lambda_handler(event, context):
    # TODO implement
    f = "967.pdf"
    images = convert_from_path(f,dpi=150)

    return {
        'statusCode': 200,
        'body': images
    }

I am getting the error -

   {
     "errorMessage": "Unable to get page count. Is poppler installed and in 
                     PATH?",
     "errorType": "PDFInfoNotInstalledError",
     "stackTrace": [
       "  File \"/var/task/lambda_function.py\", line 15, in 
       lambda_handler\n    images = 
       convert_from_path(f,dpi=150,poppler_path=poppler_path)\n",
       "  File \"/opt/python/pdf2image/pdf2image.py\", line 80, in 
       convert_from_path\n    page_count = _page_count(pdf_path, userpw, 
       poppler_path=poppler_path)\n",
       "  File \"/opt/python/pdf2image/pdf2image.py\", line 355, in 
       _page_count\n    \"Unable to get page count. Is poppler installed 
       and in PATH?\"\n"
    ]
   }

Upvotes: 2

Views: 4247

Answers (2)

max_x_x
max_x_x

Reputation: 41

In case someone comes around this issue: I would also consider using pymupdf for converting pdf to png or jpeg - you can pip install it and include it in your deployment package for aws lambda - no system level dependencies needed. Then, you can do something like this:

import fitz

#data = <Read your pdf file>

document: fitz.Document = fitz.open(stream=io.BytesIO(data), filetype="pdf")

images = []
#Iterate over pages and convert each to the desired format - here PNG
for page in document:
    pix = page.get_pixmap(alpha=False, dpi=200)
    image_bytes = pix.tobytes(output="png")
    buffer = io.BytesIO(image_bytes)
    buffer.seek(0)
    images.append(buffer)

Also, if you struggle to get down the size of your deployment package, you can remove the legacy directory "fitz_old" after installation. This includes legacy code that is not needed if you use the latest version and cuts the size of you you deployment package by ~27 MB.

Upvotes: 0

Belval
Belval

Reputation: 1506

Poppler is not installed on Lambda, you have to package it during your deployment. Since this is something that gets brought up a lot, I made a repository for the procedure:

https://github.com/Belval/pdf2image-as-a-service

If for some reason you do not want to use the above here are the general steps to building and including poppler in your package:

  1. Build poppler
  2. Move the bin/ directory and libpoppler is a specific directory in your package
  3. Edit your code to use the poppler_path

Again, you can also just read the script in as-a-function/amazon/lambda.sh

Upvotes: 2

Related Questions