Reputation: 5126
I have AWS S3 key path as bucket-name/fo1/fo2/fo3
that has subpaths as
bucket-name/fo1/fo2/fo3/fo_1, bucket-name/fo1/fo2/fo3/fo_2, bucket-name/fo1/fo2/fo3/fo_3
and so on. I want to iterate over these keys fo_1, fo_2, fo_3 etc.
within the path bucket-name/fo1/fo2/fo3
.
I tried the following but this doesn't work.
s3 = boto3.client('s3')
s3_bucket = 'bucket-name'
prefix = 'fo1/fo2/fo3'
for obj in s3.list_objects_v2(Bucket=s3_bucket, Prefix=prefix, Delimiter='/'):
# Here when I print obj, it's a string with value as 'MaxKeys'
Any help will be appreciated!
UPDATE:
s3://bucket-name/
fo1/
fo2/
fo3/
fo_1/
file1
...
fo_2/
file2
...
fo_3/
file1
...
fo_4/
file1
...
...
This is my structure and I am looking to get fo_1, fo_2, fo_3 and files inside it. I want everything inside object fo3
and nothing outside of that.
Upvotes: 1
Views: 2392
Reputation: 201
Possibly the following piece of code can be of use for you. I expanded a bit on John's answer as I was looking for something similar. I basically recreated the os.walk() behavior, which you might be more familiar with
import os
import boto3
# function to replicate os.walk behavior
def s3walk( locations,prefix):
# recursively add location to roots starting from prefix
def processLocation( root,prefixLocal,location):
# add new root location if not available
if not prefixLocal in root:
root[prefixLocal]=(set(),set())
# check how many folders are available after prefix
remainder = location[len(prefixLocal)+1:]
structure = remainder.split('/')
#if we are not yet in the folder of the file we need to continue with a larger prefix
if len(structure)>1:
# add folder dir
root[prefixLocal][0].add(structure[0])
#make sure file is added allong the way
processLocation(root, prefixLocal+'/'+structure[0],location )
else:
# add to file
root[prefixLocal][1].add(structure[0])
root={}
for location in locations:
processLocation(root,prefix,location)
return root.items()
if __name__ == "__main__":
s3_client = boto3.client('s3', region_name='eu-west-3')
s3_bucket = 'bucket-name'
prefix = 'fo1/fo2/fo3'
# get list of objects with prefix
response = s3_client.list_objects_v2(Bucket=s3_bucket,Prefix=prefix)
# retrieve key values
locations = [ object['Key'] for object in response['Contents']]
for root, (subdir, files) in s3walk(locations,prefix):
print(root,subdir,files)
Upvotes: 0
Reputation: 269320
You should examine the value returned by the list_objects_v2()
call to understand the data that is being returned.
CommonPrefixes
.import boto3
s3_client = boto3.client('s3', region_name='ap-southeast-2')
s3_bucket = 'my-bucket'
prefix = 'fo1/fo2/fo3/'
response = s3_client.list_objects_v2(Bucket=s3_bucket)
for object in response['Contents']:
if object['Key'].startswith(prefix):
print(object['Key'])
Upvotes: 0
Reputation: 269320
The first thing to understand about Amazon S3 is that folders do not exist
. Rather, objects are stored with their full path as their Key
(filename).
For example, I could copy a file to a bucket using the AWS Command-Line Interface (CLI):
aws s3 cp foo.txt s3://my-bucket/fo1/fo2/fo3/foo.txt
This would work even though the folders do not exist.
To make things convenient for humans, there is a "pretend" set of folders that are provided via the concept of a common prefix. Thus, in the management console, the folders would appear to be there. However, if the object was then deleted with:
aws s3 rm s3://my-buket/fo1/fo2/fo3/foo.txt
The result is that the folders would immediately disappear because they never actually existed!
Also for convenience, some Amazon S3 commands allow you to specify a Prefix
and Delimiter
. This can be used to, for example, only list objects in the fo3
folder. What it is really doing is merely listing the objects that have a Key
that starts with fo1/fo2/fo3/
. When the Key
for the object is returned, it will always have the full path to the object, because the Key
actually is the full path. (There is no concept of a filename separate to the complete Key
.)
So, if you want a listing of all files in fo1
and fo2
and fo3
, you can do a listing with a Prefix
of fo1
and receive back all objects that start with fo1/
, but this will include objects in sub-folders since they all have a prefix of fo1/
.
Bottom line: Rather than thinking of old-fashioned directories, think of Amazon S3 as a flat storage structure. If necessary, you can do filtering of results in your own code.
Upvotes: 3