Reputation: 13593
AWS CLI provides aws s3 sync command to sync data between 2 locations.
Is there an equivalent command in boto3?
I can't find this kind of command in boto3 documentation.
Upvotes: 10
Views: 16374
Reputation: 1643
This is the solution you want if you don't want to run subprocesses. However, it is not as complete as aws s3 sync
, but will download data recursively only if files on s3 are newer. Test on Linux and macOS. May require some edits on Windows to manage paths.
import os
import boto3
import datetime
def synchronize_s3_folder(s3_path: str, local_dir: str):
r"""
Download the contents of a folder recursively into a directory
Args:
s3_path: the folder path in the s3 bucket
local_dir: a relative or absolute directory path in the local file system
"""
assert s3_path.startswith("s3://")
s3_path = s3_path.removeprefix("s3://")
bucket_name, *path_parts = s3_path.split(os.sep)
s3_folder = os.path.join(*path_parts)
# assumes credentials & configuration are handled outside python in .aws directory or environment variables
s3_resource = boto3.resource('s3')
s3_client = boto3.client('s3')
bucket = s3_resource.Bucket(bucket_name)
for obj in bucket.objects.filter(Prefix=s3_folder):
target = os.path.join(local_dir, os.path.relpath(obj.key, s3_folder))
if not os.path.exists(os.path.dirname(target)):
os.makedirs(os.path.dirname(target))
if obj.key[-1] == '/':
continue
# getting metadata of s3 object
meta_data = s3_client.head_object(Bucket=bucket.name, Key=obj.key)
# checking whether s3 file is newer and need update
if os.path.isfile(target):
s3_last_modified = meta_data['LastModified'].replace(tzinfo=datetime.timezone.utc)
local_last_modified = datetime.datetime.utcfromtimestamp(os.path.getmtime(target))
local_last_modified = local_last_modified.replace(tzinfo=datetime.timezone.utc)
if local_last_modified > s3_last_modified:
continue
tmp_target = f"{target}.tmp"
bucket.download_file(obj.key, tmp_target)
os.rename(tmp_target, target)
Upvotes: 0
Reputation: 21
to add to the above, if you call the process from within a python script and don't have global aws credentials set and don't want to do "aws configure" for whatever reason, if you do this it should work.
import os
import subprocess
os.environ['AWS_ACCESS_KEY_ID'] = access_key
os.environ['AWS_SECRET_ACCESS_KEY'] = secret_key
command = "sync local_folder s3://s3_folder"
aws_cli_command = f"aws {command} "
result = subprocess.run(aws_cli_command, shell=True, capture_output=True, text=True)
Upvotes: 1
Reputation: 20052
boto3
does not include s3 sync capabilities. That is only available via the AWS CLI tool.
Interestingly, there's still an open issue at boto's Github that dates back to... 2015.
I guess your best bet is to run the aws s3 sync
from within a Python script.
Here's a sample implementation.
Alternatively, you might want to explore the DataSync client.
Upvotes: 8