Viktor Kerkez
Viktor Kerkez

Reputation: 46566

S3 Python - Multipart upload to s3 with presigned part urls

I'm unsuccessfully trying to do a multipart upload with pre-signed part URLs.

This is the procedure I follow (1-3 is on the server-side, 4 is on the client-side):

  1. Instantiate boto client.
import boto3
from botocore.client import Config

s3 = boto3.client(
    "s3",
    region_name=aws.default_region,
    aws_access_key_id=aws.access_key_id,
    aws_secret_access_key=aws.secret_access_key,
    config=Config(signature_version="s3v4")
)
  1. Initiate multipart upload.
upload = s3.create_multipart_upload(
    Bucket=AWS_S3_BUCKET,
    Key=key,
    Expires=datetime.now() + timedelta(days=2),
)
upload_id = upload["UploadId"]
  1. Create a pre-signed URL for the part upload.

part = generate_part_object_from_client_submited_data(...)

part.presigned_url = s3.generate_presigned_url(
    ClientMethod="upload_part",
    Params={
        "Bucket": AWS_S3_BUCKET,
        "Key": upload_key,
        "UploadId": upload_id,
        "PartNumber": part.no,
        "ContentLength": part.size,
        "ContentMD5": part.md5,
    },
    ExpiresIn=3600,  # 1h
    HttpMethod="PUT",
)

Return the pre-signed URL to the client.

  1. On the client try to upload the part using requests.
part = receive_part_object_from_server(...)

with io.open(filename, "rb") as f:
    f.seek(part.offset)
    buffer = io.BytesIO(f.read(part.size))

r = requests.put(
    part.presigned_url,
    data=buffer,
    headers={
        "Content-Length": str(part.size),
        "Content-MD5": part.md5,
        "Host": "AWS_S3_BUCKET.s3.amazonaws.com",
    },
)

And when I try to upload I either get:

urllib3.exceptions.ProtocolError:
('Connection aborted.', BrokenPipeError(32, 'Broken pipe'))

Or:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>NoSuchUpload</Code>
  <Message>
    The specified upload does not exist. The upload ID may be invalid,
    or the upload may have been aborted or completed.
  </Message>
  <UploadId>CORRECT_UPLOAD_ID</UploadI>
  <RequestId>...</RequestId>
  <HostId>...</HostId>
</Error>

Even though the upload still exist and I can list it.

Can anyone tell me what am I doing wrong?

Upvotes: 23

Views: 11840

Answers (4)

Abdullah Khawer
Abdullah Khawer

Reputation: 5678

Presigned URL Approach

You can study AWS S3 Presigned URLs for Python SDK (Boto3) and how to use multipart upload APIs at the following links:

  1. Amazon S3 Examples > Presigned URLs
  2. Python Code Samples for Amazon S3 > generate_presigned_url.py
  3. Boto3 > S3 > create_multipart_upload
  4. Boto3 > S3 > complete_multipart_upload

Transfer Manager Approach

Boto3 provides interfaces for managing various types of transfers with S3 to automatically manage multipart and non-multipart uploads. To ensure that multipart uploads only happen when absolutely necessary, you can use the multipart_threshold configuration parameter.

Try out the following code for Transfer Manager approach:

import boto3
from boto3.s3.transfer import TransferConfig
import botocore
from botocore.client import Config
from retrying import retry
import sys

def upload(source, dest, bucket_name):
    try:
        conn = boto3.client(
            service_name="s3",
            aws_access_key_id=[key],
            aws_secret_access_key=[key],
            endpoint_url=[endpoint],
            config=Config(signature_version='s3'
        )
        config = TransferConfig(
            multipart_threshold=1024*20,  
            max_concurrency=3, 
            multipart_chunksize=1024*20, 
            use_threads=True
        )
        conn.upload_file(
            Filename=source,
            Bucket=bucket_name,     
            Key=dest,
            Config=config
        )
    except Exception as e:
        raise Exception(str(e))

def download(src, dest, bucket_name):
    try:
        conn = boto3.client(
            service_name="s3", 
            aws_access_key_id=[key],    
            aws_secret_access_key=[key],                   
            endpoint_url=[endpoint],
            config=Config(signature_version='s3'
        )
        config = TransferConfig(
            multipart_threshold=1024*20,  
            max_concurrency=3, 
            multipart_chunksize=1024*20, 
            use_threads=True
        )
        conn.download_file(
            bucket=bucket_name,
            key=src, 
            filename=dest,
            Config=config
        )                
    except AWSConnectionError as e:
        raise AWSConnectionError("Unable to connect to AWS")
    except Exception as e:
        raise Exception(str(e))if __name__ == '__main__': 

upload(source, dest, bucket_name)

download(src, dest, bucket_name)

AWS STS Approach

You can also follow the AWS Security Token Service (STS) approach to generate a set of temporary credentials to complete your task instead.

Try out the following code for the AWS STS approach:

import json
from uuid import uuid4
import boto3

def get_upload_credentials_for(bucket, key, username):
    arn = 'arn:aws:s3:::%s/%s' % (bucket, key)
    policy = {"Version": "2012-10-17",
              "Statement": [{
                  "Sid": "Stmt1",
                  "Effect": "Allow",
                  "Action": ["s3:PutObject"],
                  "Resource": [arn],
              }]}
    client = boto3.client('sts')
    response = client.get_federation_token(
        Name=username, Policy=json.dumps(policy))
    return response['Credentials']

def client_from_credentials(service, credentials):
    return boto3.client(
        service,
        aws_access_key_id=credentials['AccessKeyId'],
        aws_secret_access_key=credentials['SecretAccessKey'],
        aws_session_token=credentials['SessionToken'],
    )

def example():
    bucket = 'mybucket'
    filename = '/path/to/file'

    key = uuid4().hex
    print(key)

    prefix = 'tmp_upload_'
    username = prefix + key[:32 - len(prefix)]
    print(username)
    assert len(username) <= 32  # required by the AWS API

    credentials = get_upload_credentials_for(bucket, key, username)
    client = client_from_credentials('s3', credentials)
    client.upload_file(filename, bucket, key)
    client.upload_file(filename, bucket, key + 'bob')  # fails

example()

MinIO Client SDK for Python Approach

You can use MinIO Client SDK for Python which implements simpler APIs to avoid the gritty details of multipart upload.

For example, you can use a simple fput_object(bucket_name, object_name, file_path, content_type) API to do the need full.

Try out the following code for MinIO Client SDK for Python approach:

from minio import Minio
from minio.error import ResponseError
    
s3client = Minio(
    's3.amazonaws.com',
    access_key='YOUR-ACCESSKEYID',
    secret_key='YOUR-SECRETACCESSKEY'
)
    
# Put an object 'my-objectname' with contents from 'my-filepath'
    
try:    
    s3client.fput_object(
        'my-bucketname',
        'my-objectname',
        'my-filepath'
    )
except ResponseError as err:
    print(err)

Upvotes: 3

Piotr
Piotr

Reputation: 406

Make sure that when you connect to the S3 endpoints you use proper s3 domain name (which should include region!) Bucket name in the header name is not enough.

The easiest thing to debug this is just try to generate presign URL with AWS CLI with --debug option

aws s3 presign s3://your-bucket/file --expires-in 604800 --region eu-central-1 --debug

and then just use it with curl

curl -X GET "https://your-bucket.s3.eu-central-1.amazonaws.com/file

Normally aws client can redirect based on bucket name (it contains quite a lot of logic), but http client will not so you need to talk with proper endpoints.

In other words, change:

"Host": "AWS_S3_BUCKET.s3.amazonaws.com"

to

"Host": "AWS_S3_BUCKET.s3.REGION.amazonaws.com"

Upvotes: 0

Fabio Manzano
Fabio Manzano

Reputation: 2865

Did you try pre-signed POST instead? Here is the AWS Python reference for it: https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/s3-presigned-post.html

This will potentially workaround proxy limitations from client perspective, if any:

pre signed POST example

As a last resort, you can always try good old REST API, although I don't think the issue is in your code and neither in boto3: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingRESTAPImpUpload.html

Upvotes: 1

Piotr Czapla
Piotr Czapla

Reputation: 26522

Here is a command utilty that does exactly the same thing, you might want to give it at try and see if it works. If it does it will be easy to find the difference between your code and theirs. If it doesn't I would double check the whole process. Here is an example how to upload a file using aws commandline https://aws.amazon.com/premiumsupport/knowledge-center/s3-multipart-upload-cli/?nc1=h_ls

Actually if it does work. Ie you can replecate the upload using aws s3 commands then we need to focus on the use of persigned url. You can check how the url should look like here:

https://github.com/aws/aws-sdk-js/issues/468 https://github.com/aws/aws-sdk-js/issues/1603

This are js sdk but the guys there talk about the raw urls and parameters so you should be able to spot the difference between your urls and the urls that are working.

Another option is to give a try this script, it uses js to upload file using persigned urls from web browser.

https://github.com/prestonlimlianjie/aws-s3-multipart-presigned-upload

If it works you can inspect the communication and observe the exact URLs that are being used to upload each part, which you can compare with the urls your system is generating.

Btw. once you have a working url for multipart upload you can use the aws s3 presign url to obtain the persigned url, this should let you finish the upload using just curl to have full control over the upload process.

Upvotes: 1

Related Questions