Reputation: 714
I'm trying to list all so-called folders
and sub-folders
in an s3
bucket.
Now, as I am trying to list all the folders in a path recursively I am not using withDelimeter()
function.
All the so-called folder
names should end with /
and this is my logic to list all the folders and sub-folders.
Here's the scala
code (Intentionally not pasting the catch
code here):
val awsCredentials = new BasicAWSCredentials(awsKey, awsSecretKey)
val client = new AmazonS3Client(awsCredentials)
def listFoldersRecursively(bucketName: String, fullPath: String): List[String] = {
try {
val objects = client.listObjects(bucketName).getObjectSummaries
val listObjectsRequest = new ListObjectsRequest()
.withPrefix(fullPath)
.withBucketName(bucketName)
val folderPaths = client
.listObjects(listObjectsRequest)
.getObjectSummaries()
.map(_.getKey)
folderPaths.filter(_.endsWith("/")).toList
}
}
Here's the structure of my bucket
through an s3 client
Here's the list I am getting using this scala
code
Without any apparent pattern, many folders are missing from the list of retrieved folders. I did not use
client.listObjects(listObjectsRequest).getCommonPrefixes.toList
because it was returning empty list for some reason.
P.S: Couldn't add photos in post directly because of being a new user.
Upvotes: 4
Views: 4901
Reputation: 714
Well, in case someone faces the same problem in future, the alternative logic I used is as suggested by @Michael above, I iterated through all the keys, splat them at last occurrence of /
. The first index
of the returned list + /
was the key of a folder, appended it to another list. At the end, returned the unique
list I was appending into. This gave me all the folders
and sub-folders
in a certain prefix location.
Note that I didn't use CommonPrefixes
because I wasn't using any delimiter
and that's because I didn't want the list of folders
at a certain level but instead recursively
get all the folders
and sub-folders
def listFoldersRecursively(bucketName: String, fullPath: String): List[String] = {
try {
val objects = client.listObjects(bucketName).getObjectSummaries
val listObjectsRequest = new ListObjectsRequest()
.withPrefix(fullPath)
.withBucketName(bucketName)
val folderPaths = client.listObjects(listObjectsRequest)
.getObjectSummaries()
.map(_.getKey)
.toList
val foldersList: ArrayBuffer[String] = ArrayBuffer()
for (folderPath <- folderPaths) {
val split = folderPath.splitAt(folderPath.lastIndexOf("/"))
if (!split._1.equals(""))
foldersList += split._1 + "/"
}
foldersList.toList.distinct
P.S: Catch block is intentionalyy missing due to irrelevancy.
Upvotes: 5
Reputation: 179364
Without any apparent pattern, many folders are missing from the list of retrieved folders.
Here's your problem: you are assuming there should always be objects with keys ending in /
to symbolize folders.
This is an incorrect assumption. They will only be there if you created them, either via the S3 console or the API. There's no reason to expect them, as S3 doesn't actually need them or use them for anything, and the S3 service does not create them spontaneously, itself.
If you use the API to upload an object with key foo/bar.txt
, this does not create the foo/
folder as a distinct object. It will appear as a folder in the console for convenience, but it isn't there unless at some point you deliberately created it.
Of course, the only way to upload such an object with the console is to "create" the folder unless it already appears -- but appears in the console does not necessarily equate to exists as a distinct object.
Filtering on endsWith("/")
is invalid logic.
This is why the underlying API includes CommonPrefixes
with each ListObjects response if delimiter
and prefix
are specified. This is a list of the next level of "folders", which you have to recursively drill down into in order to find the next level.
If you specify a prefix, all keys that contain the same string between the prefix and the first occurrence of the delimiter after the prefix are grouped under a single result element called CommonPrefixes. If you don't specify the prefix parameter, the substring starts at the beginning of the key. The keys that are grouped under the CommonPrefixes result element are not returned elsewhere in the response.
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html
You need to access this functionality with whatever library you or using, or, you need to iterate the entire list of keys and discover the actual common prefixes on /
boundaries using string splitting.
Upvotes: 6
Reputation: 1140
The listObjects
function (and others) is paginating, returning up to 100 entries every time.
From the doc:
Because buckets can contain a virtually unlimited number of keys, the complete results of a list query can be extremely large. To manage large result sets, Amazon S3 uses pagination to split them into multiple responses. Always check the ObjectListing.isTruncated() method to see if the returned listing is complete or if additional calls are needed to get more results. Alternatively, use the AmazonS3Client.listNextBatchOfObjects(ObjectListing) method as an easy way to get the next page of object listings.
Upvotes: 2