raj
raj

Reputation: 1

Trying to compress pdf file stored in AWS S3 bucket using AWS Lambda with python but it showing Error

I am trying to compress PDF file which is stored in S3 bucket (testforbucket023) and after this the compressed PDF file should be store in another S3 bucket (testforbucket023-compressed). I have created python deployment package after getting python code from the internet. When I run the code inside AWS lambda function it is showing error

"errorMessage": "Unable to import module 'lambda_function': No module named 'lambda_function'", "errorType": "Runtime.ImportModuleError"

test.pdf is the PDF file stored in input S3 bucket (testforbucket023) which I want to compress.
I have all the python libraries/packages (boto3, PyPDF2, typing_extensions, botocare) inside my python package (lambda_function.zip).

Can anyone please suggest me regarding the troubleshoot steps of this error.

Python code inside lambda_function.py file is

import boto3
import botocore
import PyPDF2
import io

def lambda_handler(event, context):
    # Retrieve the input and output S3 bucket names from the event
    input_bucket = event['testforbucket023']
    output_bucket = event['testforbucket023-compressed']

    # Retrieve the S3 object key for the input PDF file from the event
    key = event['test.pdf']

    # Create an S3 client
    s3 = boto3.client('s3')

    try:
        # Download the PDF file from the input S3 bucket
        response = s3.get_object(Bucket=input_bucket, Key=key)
        pdf_content = response['Body'].read()

        # Compress the PDF file
        compressed_pdf_content = compress_pdf(pdf_content)

        # Upload the compressed PDF file to the output S3 bucket
        compressed_key = f'compressed_{key}'
        s3.put_object(Body=compressed_pdf_content, Bucket=output_bucket, Key=compressed_key)

        return {
            'statusCode': 200,
            'body': f'Compressed PDF file uploaded as {compressed_key}'
        }
    except botocore.exceptions.ClientError as e:
        return {
            'statusCode': 500,
            'body': str(e)
        }

def compress_pdf(pdf_content):
    # Create a PDF reader
    pdf_reader = PyPDF2.PdfFileReader(io.BytesIO(pdf_content))

    # Create a PDF writer
    pdf_writer = PyPDF2.PdfFileWriter()

    # Iterate through each page of the PDF file
    for page_num in range(pdf_reader.getNumPages()):
        # Get the page
        page = pdf_reader.getPage(page_num)

        # Compress the page
        page.compressContentStreams()

        # Add the compressed page to the PDF writer
        pdf_writer.addPage(page)

    # Create an in-memory stream for the compressed PDF content
    compressed_pdf_stream = io.BytesIO()
    pdf_writer.write(compressed_pdf_stream)
    compressed_pdf_stream.seek(0)

    return compressed_pdf_stream.getvalue()




I have replaced the 'input_bucket' and 'output_bucket' in source code file with the name of my input and output bucket names. The original python code was- 



import boto3
import botocore
import PyPDF2
import io

def lambda_handler(event, context):
    # Retrieve the input and output S3 bucket names from the event
    input_bucket = event['input_bucket']
    output_bucket = event['output_bucket']

    # Retrieve the S3 object key for the input PDF file from the event
    key = event['key']

    # Create an S3 client
    s3 = boto3.client('s3')

    try:
        # Download the PDF file from the input S3 bucket
        response = s3.get_object(Bucket=input_bucket, Key=key)
        pdf_content = response['Body'].read()

        # Compress the PDF file
        compressed_pdf_content = compress_pdf(pdf_content)

        # Upload the compressed PDF file to the output S3 bucket
        compressed_key = f'compressed_{key}'
        s3.put_object(Body=compressed_pdf_content, Bucket=output_bucket, Key=compressed_key)

        return {
            'statusCode': 200,
            'body': f'Compressed PDF file uploaded as {compressed_key}'
        }
    except botocore.exceptions.ClientError as e:
        return {
            'statusCode': 500,
            'body': str(e)
        }

def compress_pdf(pdf_content):
    # Create a PDF reader
    pdf_reader = PyPDF2.PdfFileReader(io.BytesIO(pdf_content))

    # Create a PDF writer
    pdf_writer = PyPDF2.PdfFileWriter()

    # Iterate through each page of the PDF file
    for page_num in range(pdf_reader.getNumPages()):
        # Get the page
        page = pdf_reader.getPage(page_num)

        # Compress the page
        page.compressContentStreams()

        # Add the compressed page to the PDF writer
        pdf_writer.addPage(page)

    # Create an in-memory stream for the compressed PDF content
    compressed_pdf_stream = io.BytesIO()
    pdf_writer.write(compressed_pdf_stream)
    compressed_pdf_stream.seek(0)

    return compressed_pdf_stream.getvalue()

Upvotes: 0

Views: 908

Answers (1)

Shubham Bansal
Shubham Bansal

Reputation: 488

Steps to troubleshoot:

  1. Unzip your package in your laptop and see if you have lambda_function.py file in the root of your folder i.e. while making the lambda package you have zipped the contents of the folder and not the folder itself.
  2. Try creating a new lambda function without a lambda package and just copy paste the lambda_function.py contents in AWS lambda console and see if it gives error related to PyPDF2 module. If it does then it means point 1 should be the case for you.
  3. Check the settings of lambda function in AWS console that the entry point of lambda function is defined as lambda_function.lambda_handler.

Upvotes: 1

Related Questions