sagism
sagism

Reputation: 921

Can I resume a download from aws s3?

I am using the python boto3 library to download files from s3 to an IOT device on a cellular connection which is often slow and shaky.

Some files are quite large (250Mb, which for this scenario is large) and the network fails and the device reboots while downloading.

I would like to resume the download from the place it ended when the device rebooted. Is there any way to do it?

The aborted download does seem to keep downloaded data in a temporary file while downloading so the data is there.

The goal is to economize data transfer and make the download more resilient.

I am using multipart uploads but no resume happens by itself.

What i'm doing is something like this:

s3 = boto.resource('s3')
session = boto.session.Session(region_name='eu-central-1', profile_name=profile)
s3client = session.client( 's3', config=boto.session.Config(signature_version='s3v4'))
MB = 1024 ** 2

config = TransferConfig(
    multipart_threshold=10*MB,
    num_download_attempts=100)

def upload():
    s3client.upload_file(Filename=localfile, Bucket=bucket, Key=key, Config=config)

def download():
    s3client.download_file(bucket, key, localfile, Config=config )

# upload from server...
upload()

# .... later, from IOT device
download()

Upvotes: 2

Views: 3995

Answers (2)

jarmod
jarmod

Reputation: 78860

I don't believe that boto3 has a resumable download feature.

You could potentially implement one yourself by making use of ranged gets. Find the size of the object upfront using head_object, then split that into N ranges, download them individually (maybe K chunks in parallel, depending on your hardware), store them on the local file system as chunks, and re-compose them into the final download when all chunks complete.

response = client.get_object(
    Bucket='mybucket',
    Key='mykey',
    Range='bytes=10001-20000'
)

Upvotes: 4

Tuan Vo
Tuan Vo

Reputation: 2085

From terminal, you can use aws s3api for lower-level access to s3.

size=$(stat myfile.zip); aws s3api get-object --bucket BUCKETNAME --key myfile.zip --range "bytes=$size-" myfile.part; cat myfile.part >> myfile.zip

I think you can call this command via python. Not too hard.

Upvotes: 1

Related Questions