kwsong0314
kwsong0314

Reputation: 251

How to download everything in that folder using boto3

I want to download all the csv files that exist in s3 folder(2021-02-15). I tried the following, but it failed. How can I do it?

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))
for obj in objs:
    client = boto3.client('s3')
    client.download_file(bucket, obj, obj)

valueError: Filename must be a string

Upvotes: 3

Views: 11672

Answers (5)

tomthedude
tomthedude

Reputation: 41

Building on the answer from Marcello, I found I had to perform a check to see if the item I was downloading was a file or a "directory" from s3. I adjusted the method slightly to also download the items from s3 into a specified folder locally.

    download_directory = './download_directory'  # replace with your root directory
    bucket = s3.Bucket('bucket') # replace with your bucket
    objs = list(bucket.objects.filter(Prefix='folder')) # replace with your s3 'folder'

    for obj in objs:
        if obj.key.endswith('/'):
            continue # if the obj is a folder / directory then skip it

        obj_path = os.path.dirname(obj.key)
        local_file_path = os.path.join(download_directory, obj.key)

        Path(os.path.dirname(local_file_path)).mkdir(parents=True, exist_ok=True)
        bucket.download_file(obj.key, local_file_path)

Upvotes: 0

hume
hume

Reputation: 2553

You could also use cloudpathlib which, for S3, wraps boto3. For your use case, it's pretty simple:

from cloudpathlib import CloudPath

cp = CloudPath("s3://bucket/product/myproject/2021-02-15/")
cp.download_to("local_folder")

Upvotes: 2

Marcello Romani
Marcello Romani

Reputation: 3144

Marcin answer is correct but files with the same name in different paths would be overwritten. You can avoid that by replicating the folder structure of the S3 bucket locally.

import boto3
import os
from pathlib import Path

s3 = boto3.resource('s3')

bucket = s3.Bucket('bucket')

key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))

for obj in objs:
    # print(obj.key)

    # remove the file name from the object key
    obj_path = os.path.dirname(obj.key)

    # create nested directory structure
    Path(obj_path).mkdir(parents=True, exist_ok=True)

    # save file with full path locally
    bucket.download_file(obj.key, obj.key)

Upvotes: 7

Krishna Chaurasia
Krishna Chaurasia

Reputation: 9572

Filter returns a collection object and not just name whereas the download_file() method is expecting the object name:

Try this:

objs = list(bucket.objects.filter(Prefix=key))
client = boto3.client('s3')
for obj in objs:
    client.download_file(bucket, obj.name, obj.name)

You could also use print(obj) to print the obj object in the loop to see what it actually has.

Upvotes: 1

Marcin
Marcin

Reputation: 238189

Since you are using resource, youu can use download_file:

import boto3

s3 = boto3.resource('s3')

bucket = s3.Bucket('bucket')

key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))

for obj in objs:
    #print(obj.key)
    out_name = obj.key.split('/')[-1]
    bucket.download_file(obj.key, out_name)  

Upvotes: 3

Related Questions