Reputation: 186

How to get top-level folders in an S3 bucket using boto3?

I have an S3 bucket with a few top level folders, and hundreds of files in each of these folders. How do I get the names of these top level folders?

I have tried the following:

s3 = boto3.resource('s3', region_name='us-west-2', endpoint_url='https://s3.us-west-2.amazonaws.com')
bucket = s3.Bucket('XXX')

for obj in bucket.objects.filter(Prefix='', Delimiter='/'):
    print obj.key

But this doesn't seem to work. I have thought about using regex to filter all the folder names, but this doesn't seem time efficient.

Thanks in advance!

Upvotes: 8

Answers (3)

Investigator

Reputation: 1549

You could also use Amazon Athena in order to analyse/query S3 buckets.

https://aws.amazon.com/athena/

Upvotes: 0

Shreyash Solanke

Reputation: 381

Try this.

import boto3

client = boto3.client('s3')
paginator = client.get_paginator('list_objects')
result = paginator.paginate(Bucket='my-bucket', Delimiter='/')
for prefix in result.search('CommonPrefixes'):
    print(prefix.get('Prefix'))

Upvotes: 18

guest

Reputation: 92

The Amazon S3 data model is a flat structure: you create a bucket, and the bucket stores objects. There is no hierarchy of subbuckets or subfolders; however, you can infer logical hierarchy using key name prefixes and delimiters as the Amazon S3 console does (source)

In other words, there's no way around iterating all of the keys in the bucket and extracting whatever structure that you want to see (depending on your needs, a dict-of-dicts may be a good approach for you).

Upvotes: -1

How to get top-level folders in an S3 bucket using boto3?

Answers (3)

Related Questions