sc_ray
sc_ray

Reputation: 8043

Listing just the sub folders in an s3 bucket

I have an s3 structure as follows:

s3bucketname -> List of first level keys -> List of second level keys -> List of third level keys -> Actual file.

What I need to do is that given the name of the s3bucket and an entry for the first level key, I need the names of all the second level keys that reside under the first level keys. So essentially if we look at it like a folder, I am given the name of the root folder which is the s3bucketname and the name of one of its subfolders subfolder1, I would like to list all the folders that reside within subfolder1. Just the names though, not the complete path.

Can somebody point out how to do it in java using amazon's java sdk?

Thanks

Upvotes: 20

Views: 44379

Answers (3)

Sma Ma
Sma Ma

Reputation: 3685

The important thing is that there is no real folder concept in S3. But let's see which tricks are possible with the S3 API.

In this example, all "subfolders" (keys) under a specific "folder" named "lala" are listed (without recursive structure of that subfolders).

Prefix="lala/" and Delimiter="/" parameters do the magic.

In addition this solution is using the S3 paginator API. The solution collects all results even if the result contains more than 1000 objects. The S3 paginator API automatically resolves the next results from 1001 to 2000 and so on.

# given "folder/key" structure
# .
# ├── lorem.txt
# ├─── lala
# │ ├── folder1
# │ │    ├── file1.txt
# │ │    └── file2.txt
# │ ├── folder2
# │ │    └── file1.txt
# │ └── folder3
# │      └── file1.txt
# └── lorem
#   └── folder4
#        ├── file1.txt
#        └── file2.txt

import boto3

s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')

# Execute paginated list_objects_v2
response = paginator.paginate(Bucket="your-bucket-name", Prefix="lala/", Delimiter="/")

# Get prefix for each page result
names = []
for page in response:
    names.extend([x["Prefix"] for x in page.get("CommonPrefixes", [])])

print(names)
# Result is:
# ['lala/folder1/','lala/folder2/','lala/folder3/']

Upvotes: 1

lazywiz
lazywiz

Reputation: 1111

Charles version is super concise! thanks @charles-menguy

I wrote an extension to support huge list through pagination.

    public List<String> getSubPathsInS3Prefix(String bucketName, String prefix) {
        if (!prefix.endsWith(FILE_DELIMITER)) {
            prefix += FILE_DELIMITER;
        }
        List<String> paths = new ArrayList<String>();
        ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
                .withBucketName(bucketName).withPrefix(prefix)
                .withMaxKeys(1000).withDelimiter(FILE_DELIMITER);
        ObjectListing currentListing = s3Client.listObjects(listObjectsRequest);
        paths.addAll(currentListing.getCommonPrefixes());

        while (currentListing == null || currentListing.isTruncated()) {
            currentListing = s3Client.listNextBatchOfObjects(currentListing);
            paths.addAll(currentListing.getCommonPrefixes());
        }
        return paths;
    }

http://www.lazywiz.com/uncategorized/s3-missing-api-list-sub-paths-in-the-s3-bucket/

Upvotes: 9

Charles Menguy
Charles Menguy

Reputation: 41428

I did the following code which seems to work fine, you have to pass a prefix and make sure the prefix ends with /, and also specify the delimiter you want to get your list of sub-directories. The following should work:

public List<String> listKeysInDirectory(String bucketName, String prefix) {
    String delimiter = "/";
    if (!prefix.endsWith(delimiter)) {
        prefix += delimiter;
    }

    ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
            .withBucketName(bucketName).withPrefix(prefix)
            .withDelimiter(delimiter);
    ObjectListing objects = _client.listObjects(listObjectsRequest);
    return objects.getCommonPrefixes();
}

Upvotes: 55

Related Questions