MAC
MAC

Reputation: 172

Recursively print subfolder and subfolder in s3

I have s3 bucket mybucket, within which I have folder called MyDocuments. Within MyDocuments, I have three layers of folders, say year, month, day. Overall Hierarchy looks something like this:

mybucket
`-- MyDocuments
    `-- year=2021
        `-- month=01
            |-- day=10
            |   |-- file1
            |   `-- file2
            `-- day=20
                |-- file3
                `-- file4
        `-- month=02
            |-- day=30
            |   |-- file11
            |   `-- file21
            `-- day=20
                |-- file31
                `-- file41

Using boto3, I'd like to return following results containing subfolders in MyDocuments folder

year=2021/month=01/day=10
year=2021/month=01/day=20
year=2021/month=02/day=30
year=2021/month=02/day=20

What I've tried so far:

import boto3
s3client = boto3.client('s3')
resp = s3client.list_objects(Bucket='mybucket', Prefix='MyDocuments', Delimiter="/")
sections = [x['Prefix'] for x in resp['CommonPrefixes']]
print(sections)

But this only gives me one level of subfolder:

['mybucket/MyDocuments/year=2021/'] 

I have millions of files within each day, how can I do this without pulling whole s3 files?

Upvotes: 2

Views: 166

Answers (1)

Marcin
Marcin

Reputation: 238051

I think easier would be to use s3 collection filters:

import boto3

s3r = boto3.resource('s3')

bucket = s3r.Bucket('mybucket')
for obj in bucket.objects.filter(Prefix='MyDocuments/').all():
  print(obj.key.rsplit('/', 1)[0])

Upvotes: 2

Related Questions