Reputation: 426
I need help to ‘recursively’ grab files in s3:
For example, I have s3 structure like this:
My-bucket/2018/06/05/10/file1.json
My-bucket/2018/06/05/11/file2.json
My-bucket/2018/06/05/12/file3.json
My-bucket/2018/06/05/13/file5.json
My-bucket/2018/06/05/14/file4.json
My-bucket/2018/06/05/15/file6.json
I need to get all files pathes with file name for given bucket:
I tried following method, but it didn’t worked for me (its returning not whole path):
public List<String> getObjectsListFromFolder4(String bucketName, String keyPrefix) {
List<String> paths = new ArrayList<String>();
String delimiter = "/";
if (keyPrefix != null && !keyPrefix.isEmpty() && !keyPrefix.endsWith(delimiter)) {
keyPrefix += delimiter;
}
ListObjectsRequest listObjectRequest = new ListObjectsRequest().withBucketName(bucketName)
.withPrefix(keyPrefix).withDelimiter(delimiter);
ObjectListing objectListing;
do {
objectListing = s3Client.listObjects(listObjectRequest);
paths.addAll(objectListing.getCommonPrefixes());
listObjectRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
return paths;
}
Upvotes: 5
Views: 12225
Reputation: 33384
There is a new utility class — S3Objects
— that provides an easy way to iterate Amazon S3 objects in a "foreach" statement. Use its withPrefix
method and then just iterate them. You can use filters and streams as well.
Here is an example (Kotlin):
val s3 = AmazonS3ClientBuilder
.standard()
.withCredentials(EnvironmentVariableCredentialsProvider())
.build()
S3Objects
.withPrefix(s3, bucket, folder)
.filter { s3ObjectSummary ->
s3ObjectSummary.key.endsWith(".gz")
}
.parallelStream()
.forEach { s3ObjectSummary ->
CSVParser.parse(
GZIPInputStream(s3.getObject(s3ObjectSummary.bucketName, s3ObjectSummary.key).objectContent),
StandardCharsets.UTF_8,
CSVFormat.DEFAULT
).use { csvParser ->
…
}
}
Upvotes: 8
Reputation: 131
getCommonPrefixes()
only lists the prefixes, not the actual keys. From the documentation:
For example, consider a bucket that contains the following keys:
- "foo/bar/baz"
- "foo/bar/bash"
- "foo/bar/bang"
- "foo/boo"
If calling listObjects with the prefix="foo/" and the delimiter="/" on this bucket, the returned S3ObjectListing will contain one entry in the common prefixes list ("foo/bar/") and none of the keys beginning with that common prefix will be included in the object summaries list.
Instead, use getObjectSummaries()
to get the keys. You also need to remove withDelimiters()
. This causes S3 to only list items in the current 'directory.' This method works for me:
public static List<String> getObjectsListFromS3(AmazonS3 s3, String bucket, String prefix) {
final String delimiter = "/";
if (!prefix.endsWith(delimiter)) {
prefix = prefix + delimiter;
}
List<String> paths = new LinkedList<>();
ListObjectsRequest request = new ListObjectsRequest().withBucketName(bucket).withPrefix(prefix);
ObjectListing result;
do {
result = s3.listObjects(request);
for (S3ObjectSummary summary : result.getObjectSummaries()) {
// Make sure we are not adding a 'folder'
if (!summary.getKey().endsWith(delimiter)) {
paths.add(summary.getKey());
}
}
request.setMarker(result.getMarker());
}
while (result.isTruncated());
return paths;
}
Consider an S3 bucket that contains the following keys:
particle.fs
test/
test/blur.fs
test/blur.vs
test/subtest/particle.fs
With this driver code:
public static void main(String[] args) {
String bucket = "playground-us-east-1-1234567890";
AmazonS3 s3 = AmazonS3ClientBuilder.standard().withRegion("us-east-1").build();
String prefix = "test";
for (String key : getObjectsListFromS3(s3, bucket, prefix)) {
System.out.println(key);
}
}
produces:
test/blur.fs
test/blur.vs
test/subtest/particle.fs
Upvotes: 5
Reputation: 776
Here is an example about how to get all files in the directory, hope can help you :
public static List<String> getAllFile(String directoryPath,boolean isAddDirectory) {
List<String> list = new ArrayList<String>();
File baseFile = new File(directoryPath);
if (baseFile.isFile() || !baseFile.exists()) {
return list;
}
File[] files = baseFile.listFiles();
for (File file : files) {
if (file.isDirectory()) {
if(isAddDirectory){
list.add(file.getAbsolutePath());
}
list.addAll(getAllFile(file.getAbsolutePath(),isAddDirectory));
} else {
list.add(file.getAbsolutePath());
}
}
return list;
}
Upvotes: -3