Brian
Brian

Reputation: 13593

Is it possible to run aws s3 sync with boto3?

AWS CLI provides aws s3 sync command to sync data between 2 locations.

Is there an equivalent command in boto3?

I can't find this kind of command in boto3 documentation.

Upvotes: 10

Views: 16374

Answers (3)

Luca Di Liello
Luca Di Liello

Reputation: 1643

This is the solution you want if you don't want to run subprocesses. However, it is not as complete as aws s3 sync, but will download data recursively only if files on s3 are newer. Test on Linux and macOS. May require some edits on Windows to manage paths.

import os
import boto3
import datetime


def synchronize_s3_folder(s3_path: str, local_dir: str):
    r"""
    Download the contents of a folder recursively into a directory

    Args:
        s3_path: the folder path in the s3 bucket
        local_dir: a relative or absolute directory path in the local file system
    """

    assert s3_path.startswith("s3://")
    s3_path = s3_path.removeprefix("s3://")

    bucket_name, *path_parts = s3_path.split(os.sep)
    s3_folder = os.path.join(*path_parts)

    # assumes credentials & configuration are handled outside python in .aws directory or environment variables
    s3_resource = boto3.resource('s3')
    s3_client = boto3.client('s3')
    bucket = s3_resource.Bucket(bucket_name)

    for obj in bucket.objects.filter(Prefix=s3_folder):
        target = os.path.join(local_dir, os.path.relpath(obj.key, s3_folder))

        if not os.path.exists(os.path.dirname(target)):
            os.makedirs(os.path.dirname(target))
        if obj.key[-1] == '/':
            continue

        # getting metadata of s3 object
        meta_data = s3_client.head_object(Bucket=bucket.name, Key=obj.key)

        # checking whether s3 file is newer and need update
        if os.path.isfile(target):
            s3_last_modified = meta_data['LastModified'].replace(tzinfo=datetime.timezone.utc)
            local_last_modified = datetime.datetime.utcfromtimestamp(os.path.getmtime(target))
            local_last_modified = local_last_modified.replace(tzinfo=datetime.timezone.utc)
            if local_last_modified > s3_last_modified:
                continue

        tmp_target = f"{target}.tmp"
        bucket.download_file(obj.key, tmp_target)
        os.rename(tmp_target, target)

Upvotes: 0

to add to the above, if you call the process from within a python script and don't have global aws credentials set and don't want to do "aws configure" for whatever reason, if you do this it should work.

import os
import subprocess
os.environ['AWS_ACCESS_KEY_ID'] = access_key
os.environ['AWS_SECRET_ACCESS_KEY'] = secret_key   
command = "sync local_folder s3://s3_folder"
aws_cli_command = f"aws {command} "
result = subprocess.run(aws_cli_command, shell=True, capture_output=True, text=True)

Upvotes: 1

baduker
baduker

Reputation: 20052

boto3 does not include s3 sync capabilities. That is only available via the AWS CLI tool.

Interestingly, there's still an open issue at boto's Github that dates back to... 2015.

I guess your best bet is to run the aws s3 sync from within a Python script.

Here's a sample implementation.

Alternatively, you might want to explore the DataSync client.

Upvotes: 8

Related Questions