Reputation: 714

Amazon S3 Client NOT listing all folders in the bucket

I'm trying to list all so-called folders and sub-folders in an s3 bucket. Now, as I am trying to list all the folders in a path recursively I am not using withDelimeter() function. All the so-called folder names should end with / and this is my logic to list all the folders and sub-folders.

Here's the scala code (Intentionally not pasting the catch code here):

val awsCredentials = new BasicAWSCredentials(awsKey, awsSecretKey)
val client = new AmazonS3Client(awsCredentials)
def listFoldersRecursively(bucketName: String, fullPath: String): List[String] = {
  try {
    val objects = client.listObjects(bucketName).getObjectSummaries
    val listObjectsRequest = new ListObjectsRequest()
      .withPrefix(fullPath)
      .withBucketName(bucketName)
    val folderPaths = client
      .listObjects(listObjectsRequest)
      .getObjectSummaries()
      .map(_.getKey)
    folderPaths.filter(_.endsWith("/")).toList
  }
}

Here's the structure of my bucket through an s3 client

Here's the list I am getting using this scala code

Without any apparent pattern, many folders are missing from the list of retrieved folders. I did not use

client.listObjects(listObjectsRequest).getCommonPrefixes.toList

because it was returning empty list for some reason.

P.S: Couldn't add photos in post directly because of being a new user.

Upvotes: 4

Answers (3)

saadi

Reputation: 714

Well, in case someone faces the same problem in future, the alternative logic I used is as suggested by @Michael above, I iterated through all the keys, splat them at last occurrence of /. The first index of the returned list + / was the key of a folder, appended it to another list. At the end, returned the unique list I was appending into. This gave me all the folders and sub-folders in a certain prefix location.

Note that I didn't use CommonPrefixes because I wasn't using any delimiter and that's because I didn't want the list of folders at a certain level but instead recursively get all the folders and sub-folders

def listFoldersRecursively(bucketName: String, fullPath: String): List[String] = {
    try {
      val objects = client.listObjects(bucketName).getObjectSummaries
      val listObjectsRequest = new ListObjectsRequest()
        .withPrefix(fullPath)
        .withBucketName(bucketName)

      val folderPaths = client.listObjects(listObjectsRequest)
        .getObjectSummaries()
        .map(_.getKey)
        .toList

      val foldersList: ArrayBuffer[String] = ArrayBuffer()
      for (folderPath <- folderPaths) {
        val split = folderPath.splitAt(folderPath.lastIndexOf("/"))
        if (!split._1.equals(""))
          foldersList += split._1 + "/"
      }
      foldersList.toList.distinct

P.S: Catch block is intentionalyy missing due to irrelevancy.

Upvotes: 5

Michael - sqlbot

Reputation: 179364

Without any apparent pattern, many folders are missing from the list of retrieved folders.

Here's your problem: you are assuming there should always be objects with keys ending in / to symbolize folders.

This is an incorrect assumption. They will only be there if you created them, either via the S3 console or the API. There's no reason to expect them, as S3 doesn't actually need them or use them for anything, and the S3 service does not create them spontaneously, itself.

If you use the API to upload an object with key foo/bar.txt, this does not create the foo/ folder as a distinct object. It will appear as a folder in the console for convenience, but it isn't there unless at some point you deliberately created it.

Of course, the only way to upload such an object with the console is to "create" the folder unless it already appears -- but appears in the console does not necessarily equate to exists as a distinct object.

Filtering on endsWith("/") is invalid logic.

This is why the underlying API includes CommonPrefixes with each ListObjects response if delimiter and prefix are specified. This is a list of the next level of "folders", which you have to recursively drill down into in order to find the next level.

If you specify a prefix, all keys that contain the same string between the prefix and the first occurrence of the delimiter after the prefix are grouped under a single result element called CommonPrefixes. If you don't specify the prefix parameter, the substring starts at the beginning of the key. The keys that are grouped under the CommonPrefixes result element are not returned elsewhere in the response.

https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html

You need to access this functionality with whatever library you or using, or, you need to iterate the entire list of keys and discover the actual common prefixes on / boundaries using string splitting.

Upvotes: 6

Michael Yakobi

Reputation: 1140

The listObjects function (and others) is paginating, returning up to 100 entries every time.

From the doc:

Because buckets can contain a virtually unlimited number of keys, the complete results of a list query can be extremely large. To manage large result sets, Amazon S3 uses pagination to split them into multiple responses. Always check the ObjectListing.isTruncated() method to see if the returned listing is complete or if additional calls are needed to get more results. Alternatively, use the AmazonS3Client.listNextBatchOfObjects(ObjectListing) method as an easy way to get the next page of object listings.

Upvotes: 2

Amazon S3 Client NOT listing all folders in the bucket

Answers (3)

Related Questions