Reputation: 8043
I have an s3 structure as follows:
s3bucketname -> List of first level keys -> List of second level keys -> List of third level keys -> Actual file.
What I need to do is that given the name of the s3bucket and an entry for the first level key, I need the names of all the second level keys that reside under the first level keys. So essentially if we look at it like a folder, I am given the name of the root folder
which is the s3bucketname
and the name of one of its subfolders subfolder1
, I would like to list all the folders that reside within subfolder1
. Just the names though, not the complete path.
Can somebody point out how to do it in java using amazon's java sdk?
Thanks
Upvotes: 20
Views: 44379
Reputation: 3685
The important thing is that there is no real folder concept in S3. But let's see which tricks are possible with the S3 API.
In this example, all "subfolders" (keys) under a specific "folder" named "lala" are listed (without recursive structure of that subfolders).
Prefix="lala/" and Delimiter="/" parameters do the magic.
In addition this solution is using the S3 paginator API. The solution collects all results even if the result contains more than 1000 objects. The S3 paginator API automatically resolves the next results from 1001 to 2000 and so on.
# given "folder/key" structure
# .
# ├── lorem.txt
# ├─── lala
# │ ├── folder1
# │ │ ├── file1.txt
# │ │ └── file2.txt
# │ ├── folder2
# │ │ └── file1.txt
# │ └── folder3
# │ └── file1.txt
# └── lorem
# └── folder4
# ├── file1.txt
# └── file2.txt
import boto3
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
# Execute paginated list_objects_v2
response = paginator.paginate(Bucket="your-bucket-name", Prefix="lala/", Delimiter="/")
# Get prefix for each page result
names = []
for page in response:
names.extend([x["Prefix"] for x in page.get("CommonPrefixes", [])])
print(names)
# Result is:
# ['lala/folder1/','lala/folder2/','lala/folder3/']
Upvotes: 1
Reputation: 1111
Charles version is super concise! thanks @charles-menguy
I wrote an extension to support huge list through pagination.
public List<String> getSubPathsInS3Prefix(String bucketName, String prefix) {
if (!prefix.endsWith(FILE_DELIMITER)) {
prefix += FILE_DELIMITER;
}
List<String> paths = new ArrayList<String>();
ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName).withPrefix(prefix)
.withMaxKeys(1000).withDelimiter(FILE_DELIMITER);
ObjectListing currentListing = s3Client.listObjects(listObjectsRequest);
paths.addAll(currentListing.getCommonPrefixes());
while (currentListing == null || currentListing.isTruncated()) {
currentListing = s3Client.listNextBatchOfObjects(currentListing);
paths.addAll(currentListing.getCommonPrefixes());
}
return paths;
}
http://www.lazywiz.com/uncategorized/s3-missing-api-list-sub-paths-in-the-s3-bucket/
Upvotes: 9
Reputation: 41428
I did the following code which seems to work fine, you have to pass a prefix
and make sure the prefix ends with /, and also specify the delimiter you want to get your list of sub-directories. The following should work:
public List<String> listKeysInDirectory(String bucketName, String prefix) {
String delimiter = "/";
if (!prefix.endsWith(delimiter)) {
prefix += delimiter;
}
ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName).withPrefix(prefix)
.withDelimiter(delimiter);
ObjectListing objects = _client.listObjects(listObjectsRequest);
return objects.getCommonPrefixes();
}
Upvotes: 55