Reputation: 3916
I am Trying to fetch list of subdirectories in S3 bucket without returning any filenames.
My S3 bucket have following structure.
s3://my-bucket/databases/mysql-<date>-<hour> # host-2022-09-09-10
s3://my-bucket/databases/mysql-<date>-<hour>/tarfiles.tar.gz
I am trying to return only directory names like mysql-<date>-<hour>
. I don't need any more sub directories or filenames inside mysql-xx
.
As everything is stored as objects, I couldn't find any solution like setting depth-level
etc.
my code:
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(S3_BUCKET)
prefix = 'databases/mysql-'
for item in my_bucket.objects.filter(Prefix=prefix):
st.write(item.key)
Other option is to do pythonic grep/filtering the filenames. But it won't help as every request will scan all the files and return and entire list has to be filtered. Unnecessarily gets expensive.
Thank you!
Upvotes: 0
Views: 782
Reputation: 10832
You want to list the shared prefixes under a given prefix.
This is supported in the underlying API, though boto3's "resource" object model does not support showing prefixes for a given resource. To accomplish this, you'll need to use the lower level "client" interface:
prefix = 'databases/mysql-'
s3 = boto3.client('s3')
paginator = s3.get_paginator("list_objects_v2")
# Specify the prefix to scan, and the delimiter to break the prefix into
for page in paginator.paginate(Bucket=S3_BUCKET, Prefix=prefix, Delimiter='/'):
for prefix in page.get("CommonPrefixes", []):
print(prefix['Prefix'])
Upvotes: 2