Reputation: 1543

Parsing Base64 encoded data containing an image from AWS Lambda with Python

I have a Lambda function setup with a POST method that should be able to receive an image as multi-form data, load the image, do some calculations and return a simple array of numbers. The Lambda function sits behind a API Gateway with Lambda-Proxy integration on and multipart/form-data set as a Binary Media Type.

However, I can't for the life of me seem to figure out how to parse the multi-form data that is returned from AWS Lambda.

The event['body'] contains base64 encoded data that I can't post here because it takes up too much space.

I use the following snip of code to parse the multi-form data:

from requests_toolbelt.multipart import decoder
multipart_string = base64.b64decode(body)
content_type = data['event']['headers']['Content-Type']
multipart_data = decoder.MultipartDecoder(multipart_string, content_type)

where content_type is 'multipart/form-data; boundary=--------------------------881952313555430391739156'.

Running through the components of multipart_data like this..

for part in multipart_data.parts:
    print(part.content)
    print(part.headers)

gives this. The content (too long to post) looks like this:

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\ ... x00\x7f\xff\xd9'

and the headers:

{b'Content-Disposition': b'form-data; name="image"; filename="8281460-3x2-700x467.jpg"', b'Content-Type': b'image/jpeg'}

However, it still is not clear to me a) What part of the content is the actual image? b) How can I extract the image, and e.g. get it into PIL with Image.open?

Supplementary information:

Here is the small Flask app I use to POST the image and return the event data:

import json

from flask import Flask, request 

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def hello(event, context):

    response = {
        "statusCode": 200,
        "event": event
    }

    return {
        "body": json.dumps(response),
    }

and here is the POSTMAN request as Python code:

import requests

url = "url-to-lambda-function"

payload = "------WebKitFormBoundary7MA4YWxkTrZu0gW\r\nContent-Disposition: form-data; name=\"image\"; filename=\"8281460-3x2-700x467.jpg\"\r\nContent-Type: image/jpeg\r\n\r\n\r\n------WebKitFormBoundary7MA4YWxkTrZu0gW--"
headers = {
    'content-type': "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW",
    'User-Agent': "PostmanRuntime/7.18.0",
    'Accept': "*/*",
    'Cache-Control': "no-cache",
    'Content-Type': "multipart/form-data; boundary=--------------------------881952313555430391739156",
    'Accept-Encoding': "gzip, deflate",
    'Content-Length': "30417",
    'Connection': "keep-alive",
    'cache-control': "no-cache"
    }

response = requests.request("POST", url, data=payload, headers=headers)

print(response.text)

Upvotes: 5

Answers (3)

ExStackChanger

Reputation: 187

To add to tmo's answer: my multipart/form-data posts (to an AWS lambda with API gateway proxy integration) required that I access the content-type header instead with:

content_type = event['multiValueHeaders']['Content-Type'][0]

and then accessing the parts of the form-data from tmo's binary_content list with:

...
file_content = binary_content[0]
filename = str(binary_content[1].decode())
team_id = str(binary_content[2].decode())

Upvotes: 0

tmo

Reputation: 1543

For anyone coming here, this is how I ended up solving it:

    body = event["body"]

    content_type = event["headers"]["Content-Type"]

    body_dec = base64.b64decode(body)

    multipart_data = decoder.MultipartDecoder(body_dec, content_type)

    binary_content = []

    for part in multipart_data.parts:
        binary_content.append(part.content)

    imageStream = io.BytesIO(binary_content[0])
    imageFile = Image.open(imageStream)
    imageArray = np.array(imageFile)

which will yield a array that you can work with, as you For me the difficulty was in understanding how multipart/form-data was stitched together again.

Upvotes: 5

Dániel Flach

Reputation: 195

AWS documentation says that the maximum payload size for (rest) API gateway is 10MB. You did not provide your image size, but if it is more than 10MB then consider redesigning your architecture. I would suggest to upload your image to S3, so your lambda function will return a signed url. After the image is uploaded to S3, you can get this object inside your lambda function and do your calculations. https://docs.aws.amazon.com/AmazonS3/latest/dev/UploadObjectPreSignedURLDotNetSDK.html

Upvotes: 2

Parsing Base64 encoded data containing an image from AWS Lambda with Python

Answers (3)

Related Questions