Anto
Anto

Reputation: 3916

python boto3 s3 client filter subdirectories and depth

I am Trying to fetch list of subdirectories in S3 bucket without returning any filenames.

My S3 bucket have following structure.

s3://my-bucket/databases/mysql-<date>-<hour>    # host-2022-09-09-10
s3://my-bucket/databases/mysql-<date>-<hour>/tarfiles.tar.gz

I am trying to return only directory names like mysql-<date>-<hour>. I don't need any more sub directories or filenames inside mysql-xx.

As everything is stored as objects, I couldn't find any solution like setting depth-level etc.

my code:

        s3 = boto3.resource('s3')
        my_bucket = s3.Bucket(S3_BUCKET)
        prefix = 'databases/mysql-'
        for item in my_bucket.objects.filter(Prefix=prefix):
            st.write(item.key)

Other option is to do pythonic grep/filtering the filenames. But it won't help as every request will scan all the files and return and entire list has to be filtered. Unnecessarily gets expensive.

Thank you!

Upvotes: 0

Views: 782

Answers (1)

Anon Coward
Anon Coward

Reputation: 10832

You want to list the shared prefixes under a given prefix.

This is supported in the underlying API, though boto3's "resource" object model does not support showing prefixes for a given resource. To accomplish this, you'll need to use the lower level "client" interface:

prefix = 'databases/mysql-'
s3 = boto3.client('s3')
paginator = s3.get_paginator("list_objects_v2")
# Specify the prefix to scan, and the delimiter to break the prefix into
for page in paginator.paginate(Bucket=S3_BUCKET, Prefix=prefix, Delimiter='/'):
    for prefix in page.get("CommonPrefixes", []):
        print(prefix['Prefix'])

Upvotes: 2

Related Questions