Reputation: 172
I have s3 bucket mybucket
, within which I have folder called MyDocuments
. Within MyDocuments, I have three layers of folders, say year, month, day
. Overall Hierarchy looks something like this:
mybucket
`-- MyDocuments
`-- year=2021
`-- month=01
|-- day=10
| |-- file1
| `-- file2
`-- day=20
|-- file3
`-- file4
`-- month=02
|-- day=30
| |-- file11
| `-- file21
`-- day=20
|-- file31
`-- file41
Using boto3, I'd like to return following results containing subfolders in MyDocuments folder
year=2021/month=01/day=10
year=2021/month=01/day=20
year=2021/month=02/day=30
year=2021/month=02/day=20
What I've tried so far:
import boto3
s3client = boto3.client('s3')
resp = s3client.list_objects(Bucket='mybucket', Prefix='MyDocuments', Delimiter="/")
sections = [x['Prefix'] for x in resp['CommonPrefixes']]
print(sections)
But this only gives me one level of subfolder:
['mybucket/MyDocuments/year=2021/']
I have millions of files within each day, how can I do this without pulling whole s3 files?
Upvotes: 2
Views: 166
Reputation: 238051
I think easier would be to use s3 collection filters:
import boto3
s3r = boto3.resource('s3')
bucket = s3r.Bucket('mybucket')
for obj in bucket.objects.filter(Prefix='MyDocuments/').all():
print(obj.key.rsplit('/', 1)[0])
Upvotes: 2