Reputation: 1970

Complete a multipart_upload with boto3?

Tried this:

import boto3
from boto3.s3.transfer import TransferConfig, S3Transfer
path = "/temp/"
fileName = "bigFile.gz" # this happens to be a 5.9 Gig file
client = boto3.client('s3', region)
config = TransferConfig(
    multipart_threshold=4*1024, # number of bytes
    max_concurrency=10,
    num_download_attempts=10,
)
transfer = S3Transfer(client, config)
transfer.upload_file(path+fileName, 'bucket', 'key')

Result: 5.9 gig file on s3. Doesn't seem to contain multiple parts.

I found this example, but part is not defined.

import boto3

bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'

s3 = boto3.client('s3')

# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
    part1 = s3.upload_part(Bucket=bucket
                           , Key=key
                           , PartNumber=1
                           , UploadId=mpu['UploadId']
                           , Body=data)

# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
    'Parts': [
        {
            'PartNumber': 1,
            'ETag': part['ETag']
        }
    ]
}

# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
                             , Key=key
                             , UploadId=mpu['UploadId']
                             , MultipartUpload=part_info)

Question: Does anyone know how to use the multipart upload with boto3?

Upvotes: 25

Answers (7)

Sunny Nazar

Reputation: 246

copy from boto3 is a managed transfer which will perform a multipart copy in multiple threads if necessary.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.copy

This works with objects greater than 5Gb and I have already tested this.

Upvotes: 0

ybonda

Reputation: 1710

As described in official boto3 documentation:

The AWS SDK for Python automatically manages retries and multipart and non-multipart transfers.

The management operations are performed by using reasonable default settings that are well-suited for most scenarios.

So all you need to do is just to set the desired multipart threshold value that will indicate the minimum file size for which the multipart upload will be automatically handled by Python SDK:

import boto3
from boto3.s3.transfer import TransferConfig

# Set the desired multipart threshold value (5GB)
GB = 1024 ** 3
config = TransferConfig(multipart_threshold=5*GB)

# Perform the transfer
s3 = boto3.client('s3')
s3.upload_file('FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME', Config=config)

Moreover, you can also use multithreading mechanism for multipart upload by setting max_concurrency:

# To consume less downstream bandwidth, decrease the maximum concurrency
config = TransferConfig(max_concurrency=5)

# Download an S3 object
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)

And finally in case you want perform multipart upload in single thread just set use_threads=False:

# Disable thread use/transfer concurrency
config = TransferConfig(use_threads=False)

s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)

Complete source code with explanation: Python S3 Multipart File Upload with Metadata and Progress Indicator

Upvotes: 14

Mark Amery

Reputation: 154585

Your code was already correct. Indeed, a minimal example of a multipart upload just looks like this:

import boto3
s3 = boto3.client('s3')
s3.upload_file('my_big_local_file.txt', 'some_bucket', 'some_key')

You don't need to explicitly ask for a multipart upload, or use any of the lower-level functions in boto3 that relate to multipart uploads. Just call upload_file, and boto3 will automatically use a multipart upload if your file size is above a certain threshold (which defaults to 8MB).

You seem to have been confused by the fact that the end result in S3 wasn't visibly made up of multiple parts:

Result: 5.9 gig file on s3. Doesn't seem to contain multiple parts.

... but this is the expected outcome. The whole point of the multipart upload API is to let you upload a single file over multiple HTTP requests and end up with a single object in S3.

Upvotes: 18

deadcode

Reputation: 2296

I would advise you to use boto3.s3.transfer for this purpose. Here is an example:

import boto3


def upload_file(filename):
    session = boto3.Session()
    s3_client = session.client("s3")

    try:
        print("Uploading file: {}".format(filename))

        tc = boto3.s3.transfer.TransferConfig()
        t = boto3.s3.transfer.S3Transfer(client=s3_client, config=tc)

        t.upload_file(filename, "my-bucket-name", "name-in-s3.dat")

    except Exception as e:
        print("Error uploading: {}".format(e))

Upvotes: 6

sarath kumar

Reputation: 1445

Change Part to Part1

import boto3

bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'

s3 = boto3.client('s3')

# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
    part1 = s3.upload_part(Bucket=bucket
                       , Key=key
                       , PartNumber=1
                       , UploadId=mpu['UploadId']
                       , Body=data)

# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
  'Parts': [
    {
        'PartNumber': 1,
        'ETag': part1['ETag']
    }
   ]
  }

# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
                         , Key=key
                         , UploadId=mpu['UploadId']
                         , MultipartUpload=part_info)

Upvotes: -1

Gourav Sengupta

Reputation: 33

Why not use just the copy option in boto3?

s3.copy(CopySource={
        'Bucket': sourceBucket,
        'Key': sourceKey}, 
    Bucket=targetBucket,
    Key=targetKey,
    ExtraArgs={'ACL': 'bucket-owner-full-control'})

There are details on how to initialise s3 object and obviously further options for the call available here boto3 docs.

Upvotes: 0

mdurant

Reputation: 28673

In your code snippet, clearly should be part -> part1 in the dictionary. Typically, you would have several parts (otherwise why use multi-part upload), and the 'Parts' list would contain an element for each part.

You may also be interested in the new pythonic interface to dealing with S3: http://s3fs.readthedocs.org/en/latest/

Upvotes: 0

Complete a multipart_upload with boto3?

Answers (7)

Related Questions