Reputation: 1844

Could we iterate over the complete set of objects in Amazon S3

I have tried to print the metadata of all the objects in S3 bucket. However, it does not return the results of more than 1000 objects. I have tried implementing the objectListing.isTruncated() and it did not help. Here is a sample code of what I did to list more than 1000 objects.

 ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
            .withBucketName(bucketName);
    ObjectListing objectListing;
    do {
        objectListing = s3client.listObjects(listObjectsRequest);
        for (S3ObjectSummary objectSummary :
                objectListing.getObjectSummaries()) {
            System.out.println( " - " + objectSummary.getKey() + "  " +
                    "(size = " + objectSummary.getSize() +
                    ")");

            listObjectsRequest.setMarker(objectListing.getNextMarker());
        }
        listObjectsRequest.setMarker(objectListing.getNextMarker());
    } while (objectListing.isTruncated());

Upvotes: 8

Answers (3)

madhead

Reputation: 33461

Amazon recently published AWS SDK for Java 2.x. The API changed, so here is an SDK 2.x version:

S3Client client = S3Client.builder().region(Region.US_EAST_1).build();
ListObjectsV2Request request = ListObjectsV2Request.builder().bucket("the-bucket").prefix("the-prefix").build();
ListObjectsV2Iterable response = client.listObjectsV2Paginator(request);

for (ListObjectsV2Response page : response) {
    page.contents().forEach(x -> System.out.println(x.key()));
}

ListObjectsV2Iterable is lazy as well:

When the operation is called, an instance of this class is returned. At this point, no service calls are made yet and so there is no guarantee that the request is valid. As you iterate through the iterable, SDK will start lazily loading response pages by making service calls until there are no pages left or your iteration stops. If there are errors in your request, you will see the failures only after you start iterating through the iterable.

Upvotes: 6

madhead

Reputation: 33461

For all those who read this in 2018+. There is a new API in Java SDK that allows you to iterate through objects in S3 bucket very easy without hustling with pagination:

AmazonS3 s3 = AmazonS3ClientBuilder.standard().build();

S3Objects.inBucket(s3, "bucket").forEach((S3ObjectSummary objectSummary) -> {
    // TODO: Consume `objectSummary` the way you need
    // System.out.println(objectSummary.key);
});

Upvotes: 13

ZZzzZZzz

Reputation: 1844

This solved my problem. I had setup a marker and truncated my list and was able to print all the objects (more than 1000).

 ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
     .withBucketName(bucketName);
 ObjectListing objectListing;
 do {
     objectListing = s3.listObjects(listObjectsRequest);
     System.out.println("Enter the path where to save yout file");
     Scanner scan = new Scanner(System.in);
     String path = scan.nextLine();
     fileOne = new File(path);
     fw = new FileWriter(fileOne.getAbsoluteFile(), true);
     bw = new BufferedWriter(fw);
     bw.write("Writing data to file");
     bw.write("\n");
     for (S3ObjectSummary objectSummary: objectListing.getObjectSummaries()) {
         String key = objectSummary.getKey();
         String dummyKey = key.substring(2);
         if (dummyKey.equalsIgnoreCase("somestring")) {
             S3Object s3object = s3.getObject(new GetObjectRequest(bucketName, key));
             BufferedReader reader = new BufferedReader(new InputStreamReader(s3object.getObjectContent()));
             String line;
             int i = 0;
             while ((line = reader.readLine()) != null) {
                 if (i > 0) {
                     bw.append(line + "," + s3object.getKey().substring(0, 2));
                     bw.append(objectSummary.getLastModified().toString());
                     bw.newLine();
                 }
                 i++;
                 System.out.println(line);
             }
         }
         //                    bw.close();
     }
     listObjectsRequest.setMarker(objectListing.getNextMarker());
 } while (objectListing.isTruncated());

Upvotes: 4

Could we iterate over the complete set of objects in Amazon S3

Answers (3)

Related Questions