Reputation: 1728

How can I post a PDF to AWS lambda

I have AWS Lambda set up.

def lambda_handler(event, context):
    return {
        'statusCode': 200,
        'body': json.dumps(event)
    }

I would like to POST in a PDF file so that I can operate on it in my lambda function.

Here is my POST code

import requests

headers = {
    'X-API-KEY':'1234',
    'Content-type': 'multipart/form-data'}

files = {
    'document': open('my.pdf', 'rb')
}

r = requests.post(url, files=files,  headers=headers)

display(r)
display(r.text)

I am getting the error:

<Response [400]>
'{"message": "Could not parse request body into json: Unexpected character (\\\'-\\\' (code 45)) in numeric value: expected digit (0-9) to follow minus sign, for valid numeric value

How can I POST over my PDF and be able to properly send over my PDF and access it in Lambda?

Note:

I am successful if I do this:

payload = '{"key1": "val1","key2": 22,"key3": 15,"key4": "val4"}' 
r = requests.post(url = URL, data=payload, headers=HEADERS)

It is just the PDF part which I can't get

Upvotes: 1

Answers (3)

curiouz

Reputation: 150

I found the solution by combining certain steps and from this discussion here. Kudos to BrendonParker.

Step 1

Encode and decode your PDF. There is multiple way to do, but mine is creating endpoint using FastAPI. Within the endpoint, I will create a tempfile for the PDF that will be encode to byte.


# create tempfile
temp_dir = tempfile.mkdtemp()
temp_file_path = os.path.join(temp_dir, pdf_file.filename)

# encode
content = base64.b64encode(pdf_file.file.read())
with open(temp_file_path, "wb") as temp_file:
     temp_file.write(content)

# decode
content = base64.b64decode(content)
buffer = io.BytesIO()
buffer.write(content)

Step 2

I am using AWS SAM CLI. Hence, it has given me a template.yml file. Within the file, below the BinaryMediaTypes, add the following list;

BinaryMediaTypes:
  - application/pdf
  - multipart/form-data

And my PDF now can be used for the next process.

Upvotes: 0

Duncan Andrew

Reputation: 31

I found this worked quite well for me:

Request

import requests


file_loc = 'path/to/test.pdf'
data = open(file_loc,'rb').read() #this is a bytes object
r = requests.post(url, data=data)
r.ok #returns True (also a good idea to check r.text

#one-liner
requests.post(url, data=open(file_loc,'rb').read())

Lambda - Python3.8

import io, base64

body = event["body"]
attachment = base64.b64decode(body.encode()) #this is a bytes object
buff = io.BytesIO(attachment) #this is now useable - read/write etc.

#one-liner
buff = io.BytesIO(base64.b64decode(event["body"].encode()))

Not quite sure why, but for me base64 encoding (even with urlsafe) in the original request corrupted the file and it was no longer recognised as a PDF in Lambda, so the OP's answer didn't work for me.

Upvotes: 1

bones225

Reputation: 1728

I figured it out. Took me a ton of time but I think I got it. Essentially it's all about encoding and decoding as bytes. Didn't have to touch the API Gateway at all.

Request:

HEADERS = {'X-API-KEY': '12345'}
data = '{"body" : "%s"}' % base64.b64encode(open(path, 'rb').read())
r = requests.post(url, data=data, headers=HEADERS)

In lambda

from io import BytesIO
def lambda_handler(event, context):
    pdf64 = event["body"]

    # Need this line as it does 'b'b'pdfdatacontent'.
    pdf64 = pdf64[2:].encode('utf-8')

    buffer = BytesIO()
    content = base64.b64decode(pdf64)
    buffer.write(content)

Upvotes: 3