Dinero
Dinero

Reputation: 1160

Download Entire Content of a subfolder in a S3 bucket

I have a bucket in s3 called "sample-data". Inside the Bucket I have folders labelled "A" to "Z".

Inside each alphabetical folder there are more files and folders. What is the fastest way to download the alphabetical folder and all it's content?

For example --> sample-data/a/foo.txt,more_files/foo1.txt

In the above example the bucket sample-data contains an folder called a which contains foo.txt and a folder called more_files which contains foo1.txt

I know how to download a single file. For instance if i wanted foo.txt I would do the following.

    s3 = boto3.client('s3')
    s3.download_file("sample-data", "a/foo.txt", "foo.txt")

However i am wondering if i can download the folder called a and all it's contents entirely? Any help would be appreciated.

Upvotes: 8

Views: 22557

Answers (2)

baduker
baduker

Reputation: 20042

I think your best bet would be the awscli

aws s3 cp --recursive s3://mybucket/your_folder_named_a path/to/your/destination

From the docs:

--recursive (boolean) Command is performed on all files or objects under the specified directory or prefix.

EDIT:

However, to do this with boto3 try this:

import os
import errno
import boto3

client = boto3.client('s3')


def assert_dir_exists(path):
    try:
        os.makedirs(path)
    except OSError as e:
        if e.errno != errno.EEXIST:
            raise


def download_dir(bucket, path, target):
    # Handle missing / at end of prefix
    if not path.endswith('/'):
        path += '/'

    paginator = client.get_paginator('list_objects_v2')
    for result in paginator.paginate(Bucket=bucket, Prefix=path):
        # Download each file individually
        for key in result['Contents']:
            # Calculate relative path
            rel_path = key['Key'][len(path):]
            # Skip paths ending in /
            if not key['Key'].endswith('/'):
                local_file_path = os.path.join(target, rel_path)
                # Make sure directories exist
                local_file_dir = os.path.dirname(local_file_path)
                assert_dir_exists(local_file_dir)
                client.download_file(bucket, key['Key'], local_file_path)


download_dir('your_bucket', 'your_folder', 'destination')

Upvotes: 19

SWater
SWater

Reputation: 443

You list all the objects in the folder you want to download. Then iterate file by file and download it.

import boto3 
s3 = boto3.client("s3")
response = s3.list_objects_v2(
    Bucket=BUCKET,
    Prefix ='DIR1/DIR2', 
)

The response is of type dict. The key that contains the list of the file names is "Contents"

Here are more information:

list all files in a bucket

boto3 documentation

I am not sure if this is the fastest solution, but it can help you.

Upvotes: 0

Related Questions