Reputation: 501

AWS S3 image saving loses metadata

I am working with an AWS Lambda function written in python 2.7x which downloads, saves to /tmp , then uploads the image file back to bucket.

My image meta data starts out in original bucket with http headers like Content-Type= image/jpeg, and others.

After saving my image with PIL, all headers are gone and I am left with Content-Type = binary/octet-stream

From what I can tell, image.save is loosing the headers due to the way PIL works. How do I either preserve metadata or at least apply it to the new saved image?

I have seen post suggesting that this metadata is in exif but I tried to get exif info from original file and apply to saved file with no luck. I am not clear of it's in exif data anyway.

Partial code to give idea of what I am doing:

def resize_image(image_path):
    with Image.open(image_path) as image:
    image.save(upload_path, optimize=True)

def handler(event, context):
    global upload_path
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode("utf8"))

        download_path = '/tmp/{}{}'.format(uuid.uuid4(), file_name)
        upload_path = '/tmp/resized-{}'.format(file_name)

        s3_client.download_file(bucket, key, download_path)

        resize_image(download_path)
        s3_client.upload_file(upload_path, '{}resized'.format(bucket), key)

Thanks to Sergey, I changed to using get_object but response is missing Metadata:

response = s3_client.get_object(Bucket=bucket,Key=key)

response= {u'Body': , u'AcceptRanges': 'bytes', u'ContentType': 'image/jpeg', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HostId': 'au30hBMN37/ti0WCfDqlb3t9ehainumc9onVYWgu+CsrHtvG0u/zmgcOIvCCBKZgQrGoooZoW9o=', 'RequestId': '1A94D7F01914A787', 'HTTPHeaders': {'content-length': '84053', 'x-amz-id-2': 'au30hBMN37/ti0WCfDqlb3t9ehainumc9onVYWgu+CsrHtvG0u/zmgcOIvCCBKZgQrGoooZoW9o=', 'accept-ranges': 'bytes', 'expires': 'Sun, 01 Jan 2034 00:00:00 GMT', 'server': 'AmazonS3', 'last-modified': 'Fri, 23 Dec 2016 15:21:56 GMT', 'x-amz-request-id': '1A94D7F01914A787', 'etag': '"9ba59e5457da0dc40357f2b53715619d"', 'cache-control': 'max-age=2592000,public', 'date': 'Fri, 23 Dec 2016 15:21:58 GMT', 'content-type': 'image/jpeg'}}, u'LastModified': datetime.datetime(2016, 12, 23, 15, 21, 56, tzinfo=tzutc()), u'ContentLength': 84053, u'Expires': datetime.datetime(2034, 1, 1, 0, 0, tzinfo=tzutc()), u'ETag': '"9ba59e5457da0dc40357f2b53715619d"', u'CacheControl': 'max-age=2592000,public', u'Metadata': {}}

If I use: metadata = response['ResponseMetadata']['HTTPHeaders']

metadata = {'content-length': '84053', 'x-amz-id-2': 'f5UAhWzx7lulo3cMVF8hdVRbHnhdnjHWRDl+LDFkYm9pubjL0A01L5yWjgDjWRE4TjRnjqDeA0U=', 'accept-ranges': 'bytes', 'expires': 'Sun, 01 Jan 2034 00:00:00 GMT', 'server': 'AmazonS3', 'last-modified': 'Fri, 23 Dec 2016 15:47:09 GMT', 'x-amz-request-id': '4C69DF8A58EF3380', 'etag': '"9ba59e5457da0dc40357f2b53715619d"', 'cache-control': 'max-age=2592000,public', 'date': 'Fri, 23 Dec 2016 15:47:10 GMT', 'content-type': 'image/jpeg'}

Saving with put_object

s3_client.put_object(Bucket=bucket+'resized',Key=key, Metadata=metadata, Body=downloadfile)

creates a whole lot of extra metadata in s3 including the fact that it does not save content-type as image/jpeg but rather as binary/octet-stream and it does create metadata x-amz-meta-content-type = image/jpeg

Upvotes: 2

Answers (3)

Fayaz K

Reputation: 1

When you save an image with PIL/Pillow, it does not retain the original file's metadata. You'll have to manually set the ContentType and any other desired metadata when you put the object back into S3.

You can extract the metadata from the original S3 object and reapply it when you upload:

response = s3_client.get_object(Bucket=bucket, Key=key)
content_type = response['ContentType']
metadata = response['Metadata']

# ... process image ...

# When uploading the processed image back to S3:
s3_client.put_object(
    Bucket=bucket + 'resized',
    Key=key,
    Metadata=metadata,  # Include any existing metadata
    ContentType=content_type,  # Set the ContentType explicitly
    Body=processed_image  # The processed image file data
)

To fix the ContentType being set as binary/octet-stream, make sure to set the ContentType explicitly when calling put_object as shown above.

It's also important to note that the Metadata field in the put_object call is for user-defined metadata (prefixed with x-amz-meta- when stored). The system-defined metadata like ContentType should be specified as separate parameters in the put_object call, not inside the Metadata dictionary.

Additionally, you should avoid using ResponseMetadata['HTTPHeaders'] directly since it contains headers that are not meant to be user-defined metadata. Instead, focus on the S3-specific metadata that you want to preserve or set new metadata as needed.

Upvotes: 0

Traz

Reputation: 270

Content type information is not on the file you upload, it has to be guessed or extracted somehow. This is something you must do manually or using tools. With a fairly small dictionary you can guess most file types.

When you upload a file or object, you have the chance to specify its content type. Otherwise S3 defaults to application/octet-stream.

Using the boto3 python package for instance:

s3client.upload_file(
    Filename=local_path,
    Bucket=bucket,
    Key=remote_path,
    ExtraArgs={
        "ContentType": "image/jpeg"
    }
)

Upvotes: 0

Sergey Kovalev

Reputation: 9411

You are confusing S3 metadata, stored by AWS S3 along with an object, and EXIF metadata, stored inside the file itself.

download_file() doesn't get object attributes from S3. You should use get_object() instead: https://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.get_object

Then you can use put_objects() with the same attributes to upload new file: https://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.put_object

Upvotes: 5

AWS S3 image saving loses metadata

Answers (3)

Related Questions