Reputation: 1
I am calling a lambda function using an API Gateway. The user passes in a PDF which is read as bytes and sent to the lambda function. The lambda function breaks while looping over the pages when I try to execute "page_text = page_obj.extractText()". Here is the code for the lambda function:
def lambda_handler(event, context): # extract event data file_content = event["content"] decode_content = base64.b64decode(file_content) read_bytes=io.BytesIO(decode_content)
# Read file into PyPDF2
bill_reader = PyPDF2.PdfReader(read_bytes)
# Get Number of Pages in Bill
bill_pages = len(bill_reader.pages)
print(bill_pages)
# Get Text from Bill
bill_text = []
for page_num in range(bill_pages):
page_obj = bill_reader.pages[page_num]
page_text = page_obj.extractText() # Lambda fcn breaks here
bill_text.append(page_text)
bill_text = ''.join(bill_text)
# There is more code after this to automatically summarize the text...
return {
'statusCode': 200,
'body': json.dumps(bill_text)
}
I am not sure exactly what happens but it seems like the function stalls out. The weird thing is this code works when I run it locally and works just fine outside the loop if I manually specify every single page to extract (ie page_obj = bill_reader.pages[0] then page_text = page_obj.extractText())
Upvotes: 0
Views: 68
Reputation: 1
I figured it out. Needed to increase the timeout for the Lambda function.
Upvotes: 0