Can I resume a download from aws s3?

Question

I am using the python boto3 library to download files from s3 to an IOT device on a cellular connection which is often slow and shaky.

Some files are quite large (250Mb, which for this scenario is large) and the network fails and the device reboots while downloading.

I would like to resume the download from the place it ended when the device rebooted. Is there any way to do it?

The aborted download does seem to keep downloaded data in a temporary file while downloading so the data is there.

The goal is to economize data transfer and make the download more resilient.

I am using multipart uploads but no resume happens by itself.

What i'm doing is something like this:

s3 = boto.resource('s3')
session = boto.session.Session(region_name='eu-central-1', profile_name=profile)
s3client = session.client( 's3', config=boto.session.Config(signature_version='s3v4'))
MB = 1024 ** 2

config = TransferConfig(
    multipart_threshold=10*MB,
    num_download_attempts=100)

def upload():
    s3client.upload_file(Filename=localfile, Bucket=bucket, Key=key, Config=config)

def download():
    s3client.download_file(bucket, key, localfile, Config=config )

# upload from server...
upload()

# .... later, from IOT device
download()

jarmod · Accepted Answer

I don't believe that boto3 has a resumable download feature.

You could potentially implement one yourself by making use of ranged gets. Find the size of the object upfront using head_object, then split that into N ranges, download them individually (maybe K chunks in parallel, depending on your hardware), store them on the local file system as chunks, and re-compose them into the final download when all chunks complete.

response = client.get_object(
    Bucket='mybucket',
    Key='mykey',
    Range='bytes=10001-20000'
)

Can I resume a download from aws s3?

Answers (2)

Related Questions