Reputation: 46566
I'm unsuccessfully trying to do a multipart upload with pre-signed part URLs.
This is the procedure I follow (1-3 is on the server-side, 4 is on the client-side):
import boto3
from botocore.client import Config
s3 = boto3.client(
"s3",
region_name=aws.default_region,
aws_access_key_id=aws.access_key_id,
aws_secret_access_key=aws.secret_access_key,
config=Config(signature_version="s3v4")
)
upload = s3.create_multipart_upload(
Bucket=AWS_S3_BUCKET,
Key=key,
Expires=datetime.now() + timedelta(days=2),
)
upload_id = upload["UploadId"]
part = generate_part_object_from_client_submited_data(...)
part.presigned_url = s3.generate_presigned_url(
ClientMethod="upload_part",
Params={
"Bucket": AWS_S3_BUCKET,
"Key": upload_key,
"UploadId": upload_id,
"PartNumber": part.no,
"ContentLength": part.size,
"ContentMD5": part.md5,
},
ExpiresIn=3600, # 1h
HttpMethod="PUT",
)
Return the pre-signed URL to the client.
requests
.part = receive_part_object_from_server(...)
with io.open(filename, "rb") as f:
f.seek(part.offset)
buffer = io.BytesIO(f.read(part.size))
r = requests.put(
part.presigned_url,
data=buffer,
headers={
"Content-Length": str(part.size),
"Content-MD5": part.md5,
"Host": "AWS_S3_BUCKET.s3.amazonaws.com",
},
)
And when I try to upload I either get:
urllib3.exceptions.ProtocolError:
('Connection aborted.', BrokenPipeError(32, 'Broken pipe'))
Or:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NoSuchUpload</Code>
<Message>
The specified upload does not exist. The upload ID may be invalid,
or the upload may have been aborted or completed.
</Message>
<UploadId>CORRECT_UPLOAD_ID</UploadI>
<RequestId>...</RequestId>
<HostId>...</HostId>
</Error>
Even though the upload still exist and I can list it.
Can anyone tell me what am I doing wrong?
Upvotes: 23
Views: 11840
Reputation: 5678
Presigned URL Approach
You can study AWS S3 Presigned URLs for Python SDK (Boto3) and how to use multipart upload APIs at the following links:
Transfer Manager Approach
Boto3 provides interfaces for managing various types of transfers with S3 to automatically manage multipart and non-multipart uploads. To ensure that multipart uploads only happen when absolutely necessary, you can use the multipart_threshold
configuration parameter.
Try out the following code for Transfer Manager approach:
import boto3
from boto3.s3.transfer import TransferConfig
import botocore
from botocore.client import Config
from retrying import retry
import sys
def upload(source, dest, bucket_name):
try:
conn = boto3.client(
service_name="s3",
aws_access_key_id=[key],
aws_secret_access_key=[key],
endpoint_url=[endpoint],
config=Config(signature_version='s3'
)
config = TransferConfig(
multipart_threshold=1024*20,
max_concurrency=3,
multipart_chunksize=1024*20,
use_threads=True
)
conn.upload_file(
Filename=source,
Bucket=bucket_name,
Key=dest,
Config=config
)
except Exception as e:
raise Exception(str(e))
def download(src, dest, bucket_name):
try:
conn = boto3.client(
service_name="s3",
aws_access_key_id=[key],
aws_secret_access_key=[key],
endpoint_url=[endpoint],
config=Config(signature_version='s3'
)
config = TransferConfig(
multipart_threshold=1024*20,
max_concurrency=3,
multipart_chunksize=1024*20,
use_threads=True
)
conn.download_file(
bucket=bucket_name,
key=src,
filename=dest,
Config=config
)
except AWSConnectionError as e:
raise AWSConnectionError("Unable to connect to AWS")
except Exception as e:
raise Exception(str(e))if __name__ == '__main__':
upload(source, dest, bucket_name)
download(src, dest, bucket_name)
AWS STS Approach
You can also follow the AWS Security Token Service (STS) approach to generate a set of temporary credentials to complete your task instead.
Try out the following code for the AWS STS approach:
import json
from uuid import uuid4
import boto3
def get_upload_credentials_for(bucket, key, username):
arn = 'arn:aws:s3:::%s/%s' % (bucket, key)
policy = {"Version": "2012-10-17",
"Statement": [{
"Sid": "Stmt1",
"Effect": "Allow",
"Action": ["s3:PutObject"],
"Resource": [arn],
}]}
client = boto3.client('sts')
response = client.get_federation_token(
Name=username, Policy=json.dumps(policy))
return response['Credentials']
def client_from_credentials(service, credentials):
return boto3.client(
service,
aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['SessionToken'],
)
def example():
bucket = 'mybucket'
filename = '/path/to/file'
key = uuid4().hex
print(key)
prefix = 'tmp_upload_'
username = prefix + key[:32 - len(prefix)]
print(username)
assert len(username) <= 32 # required by the AWS API
credentials = get_upload_credentials_for(bucket, key, username)
client = client_from_credentials('s3', credentials)
client.upload_file(filename, bucket, key)
client.upload_file(filename, bucket, key + 'bob') # fails
example()
MinIO Client SDK for Python Approach
You can use MinIO Client SDK for Python which implements simpler APIs to avoid the gritty details of multipart upload.
For example, you can use a simple fput_object(bucket_name, object_name, file_path, content_type)
API to do the need full.
Try out the following code for MinIO Client SDK for Python approach:
from minio import Minio
from minio.error import ResponseError
s3client = Minio(
's3.amazonaws.com',
access_key='YOUR-ACCESSKEYID',
secret_key='YOUR-SECRETACCESSKEY'
)
# Put an object 'my-objectname' with contents from 'my-filepath'
try:
s3client.fput_object(
'my-bucketname',
'my-objectname',
'my-filepath'
)
except ResponseError as err:
print(err)
Upvotes: 3
Reputation: 406
Make sure that when you connect to the S3 endpoints you use proper s3 domain name (which should include region!) Bucket name in the header name is not enough.
The easiest thing to debug this is just try to generate presign URL with AWS CLI with --debug option
aws s3 presign s3://your-bucket/file --expires-in 604800 --region eu-central-1 --debug
and then just use it with curl
curl -X GET "https://your-bucket.s3.eu-central-1.amazonaws.com/file
Normally aws client can redirect based on bucket name (it contains quite a lot of logic), but http client will not so you need to talk with proper endpoints.
In other words, change:
"Host": "AWS_S3_BUCKET.s3.amazonaws.com"
to
"Host": "AWS_S3_BUCKET.s3.REGION.amazonaws.com"
Upvotes: 0
Reputation: 2865
Did you try pre-signed POST instead? Here is the AWS Python reference for it: https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/s3-presigned-post.html
This will potentially workaround proxy limitations from client perspective, if any:
As a last resort, you can always try good old REST API, although I don't think the issue is in your code and neither in boto3: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingRESTAPImpUpload.html
Upvotes: 1
Reputation: 26522
Here is a command utilty that does exactly the same thing, you might want to give it at try and see if it works. If it does it will be easy to find the difference between your code and theirs. If it doesn't I would double check the whole process. Here is an example how to upload a file using aws
commandline https://aws.amazon.com/premiumsupport/knowledge-center/s3-multipart-upload-cli/?nc1=h_ls
Actually if it does work. Ie you can replecate the upload using aws s3 commands then we need to focus on the use of persigned url. You can check how the url should look like here:
https://github.com/aws/aws-sdk-js/issues/468 https://github.com/aws/aws-sdk-js/issues/1603
This are js sdk but the guys there talk about the raw urls and parameters so you should be able to spot the difference between your urls and the urls that are working.
Another option is to give a try this script, it uses js to upload file using persigned urls from web browser.
https://github.com/prestonlimlianjie/aws-s3-multipart-presigned-upload
If it works you can inspect the communication and observe the exact URLs that are being used to upload each part, which you can compare with the urls your system is generating.
Btw. once you have a working url for multipart upload you can use the aws s3 presign url
to obtain the persigned url, this should let you finish the upload using just curl
to have full control over the upload process.
Upvotes: 1