Arun Kumar
Arun Kumar

Reputation: 525

Error on importing pdfminer in aws lambda

Sorry for asking repeated question, because they didn't solve my problem which was already asked here before , How to convert pdf file from s3 to string variable using lambda function ,

My lambda function show the error

I find the below code in this answer but I am stuck in implement this code in lambda, please share your idea and I thing if the code in below is correct , the data variable will contain the string conversion of the pdf file in s3 . if No please give some suggestion to change my code

Unable to import module 'lambda_function': No module named 'pdfminer'

import json
import boto3
import botocore
import sys
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import XMLConverter, HTMLConverter, TextConverter
from pdfminer.layout import LAParams
import io
s3 = boto3.client('s3')
def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    filename = 'myfile'
    s3.download_file(bucket,key, '/tmp/'+filename)
    print('reading')
    fp = open('/tmp/'+filename, 'rU').read()
    rsrcmgr = PDFResourceManager()
    retstr = io.StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    # Create a PDF interpreter object.
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    # Process each page contained in the document.

    for page in PDFPage.get_pages(fp):
    interpreter.process_page(page)
    data =  retstr.getvalue()

    print(data)

Upvotes: 0

Views: 1527

Answers (1)

krishna_mee2004
krishna_mee2004

Reputation: 7356

The problem here is that your lambda function is unable to find pdfminer library. This library is not present in the lambda container. In order to overcome this, you need to install the library in the root of your application (where your lambda_handler file is present). To do this, there are 2 ways:

  1. Install pdfminter by running this command in the root of your directory: pip install pdfminter -t ./
  2. Create a requirements.txt file in the root of your application and define pdfminter in it. Install all the dependencies by running the following command in the root of your directory: pip install -r requirements.txt -t ./

It is always recommended that you run the above commands in a virtual environment.

References:

  • Refer this AWS document about creating Deployment packages for Lambda.
  • Refer this stackoverflow question, where they had similar issue with another dependency.
  • Document listing all modules that come installed by default in a Python Lambda environment can be found here

Upvotes: 2

Related Questions