David Osipyan
David Osipyan

Reputation: 65

S3Client of AWS Node.js SDK v3 only sends limited number of GetObjectCommand

I tried to get all objects in AWS S3 bucket using @aws-sdk/client-s3. Instead of getting 1000 object the program exits after 50 object downloads.

In while() loop the obj_list.Contents.length is equal to 1000 but the process exits after receiving responses of 50 GetObjectCommand objects.

import { S3Client, ListObjectsV2Command, GetObjectCommand } from "@aws-sdk/client-s3"

(async () => {
    const client = new S3Client({
        credentials:{
            accessKeyId:'XXXXXXXXXXXXXXXXXXXXX',
            secretAccessKey:'XXXXXXXXXXXXXXXXXXXXX'
        },
    
        region: "us-east-1"
    })

    const input = {
        Bucket: 'Bucket-Name'
    }
    const cmd = new ListObjectsV2Command(input)
    const obj_list = await client.send(cmd)

    let i = 0
    while (i < obj_list.Contents.length) {
            const command = new GetObjectCommand({
                Bucket: 'Bucket-Name',
                Key: obj_list.Contents[i++].Key
            })
            client.send(command)
                .then(
                    (data) => {
                        console.log(`Content length: ${data.ContentLength}`)
                    },
                    (error) => {
                        const { requestId, cfId, extendedRequestId } = error.$$metadata
                        console.log(`Error: ${requestId}, ${cfId}, ${extendedRequestId}`)
                    }
                )
    }
    
    console.log("Done")
})();

console.log("End")

Here is the output in Visual Studio Code console:

C:\Program Files\nodejs\node.exe .\test.js
End
Done
50
Content length: 38535294

What are the possible reasons of that?

UPD. Here is the code which creates array of Promises 10 by 10, resolve these Promises then create another slice. No difference - after 50 requests the script exits with status 13: "Process exited with code 13".

The statuses of all resolved Promises are 'fulfilled'.


// <list> contains all objects fom the bucket as in the code above
// ...
const step = 50
let i = 0 
while (i < obj_list.Contents.length)  {
    const to = Math.min(i + step, obj_list.Contents.length)
    let promises = []
    for (let f = i; f < to; ++f) {
        promises.push(client.send(
            new GetObjectCommand({
                Bucket: 'Bucket',
                Key: obj_list.Contents[f].Key
            })
        ))
    }
    const statuses = await Promise.allSettled(promises)
    i = to
}

This code exits on await Promise.all(promises) with exit code 13:

const promises = obj_list.Contents.map(async (obj_cont) => {
    const command = new GetObjectCommand({
        Bucket: 'Bucket',
        Key: obj_cont.Key
    })
    const data = await client.send(command)
  });

const statuses = await Promise.all(promises)

Terminal output:

C:\Program Files\nodejs\node.exe .\async_batch.js
Process exited with code 13

Upvotes: 0

Views: 3264

Answers (4)

Murilo Salom&#227;o
Murilo Salom&#227;o

Reputation: 140

I posted the solution to this same problem in another thread, so here is a link for it.

Basically, adding a new Agent and passing it, along with the keepAlive: false option when instantiating the S3() does the job. Here is an example of what it looks like:

import { Agent } from "https";
const s3 = new S3({
  requestHandler: {
    httpsAgent: new Agent({ keepAlive: false })
  }
});

I also used the @supercharge/promise-pool module to create a Pool of Promises and execute 50 at a time, not to reach the limit of socket connections. It looks something like this:

const filenames = [] // array with Keys to the S3 Objects I aimed to download
const POOL_LIMIT = 50;
const downloadAndZipSingleFile = async (Key) => {
  const response = await s3.send(new GetObjectCommand({ Bucket, Key }));
  const body = await response.body.transformToString();
  // whatever other operations you need...
}

await PromisePool
  .for(filenames)
  .withConcurrency(POOL_LIMIT)
  .process(downloadAndZipSingleFile);

Upvotes: 0

Adam Hewett
Adam Hewett

Reputation: 31

I ran in to this exact problem today and it took me ages to work out what could be causing such odd behaviour, as my script was also exiting after 50 executions of the loop with error "Process exited with code 13".

I've used the AWS S3 Client many times in the past for long running processes and never hit this problem before.

What got me to the answer in the end was inspecting the response from the GetObjectCommand, where I realised that the 'Body' content of the response was left unresolved. In this instance, I wasn't actually after the content of the object, just the metadata and it seems that there is a limit to the number of requests you can leave unresolved before it exits (I'm sure someone cleverer that me can explain that better).

The solution for me was to simply read the response.body content (even though I wasn't going to use it) inside the loop as follows:

const response = await s3Client.send(
  new GetObjectCommand({
    Bucket: s3Bucket,
    Key: key,
  }),
);
const body = await response.Body.transformToString();

The script was then able to execute to completion. Hope this helps someone.

Upvotes: 3

oieduardorabelo
oieduardorabelo

Reputation: 2985

Update 2023-03-21

I created a working example on my GitHub. It can make it easier for you to compare/debug the code.

Parallel or Sequence exectution of AWS SDK v3 S3 Client

Example of how to use the AWS SDK v3 S3 client to upload files to S3 in parallel or sequence execution.

Requirements

  • Run npm install to install the dependencies.
  • Set the BUCKET environment variable to a random bucket name.
  • Run node bucket-create.js to create the bucket.
  • After you run this experiment you can run node bucket-delete.js to delete the bucket.

Parallel

Using Promise.all to upload files in parallel.

Sequence

Using while loop to upload files in parallel.


Original answer

This is happening because you are using the callback version of the Promise object within the loop:

while (i < obj_list.Contents.length) {
  const command = // ...
  client.send(command)
    .then(
      (data) => {
        // callback
      },
      (error) => {
        // callback
      }
    )
}

The loop finishes before the JS event loop fires all the .then callbacks.

As suggested by @Ankush Jain, you can use await to resolve the promise:

while (i < obj_list.Contents.length) {
  try {
    const command = // ...
    const data = await client.send //... code here
    console.log(`Content length: ${data.ContentLength}`);
  } catch (error) {
    const { requestId, cfId, extendedRequestId } = error.$$metadata
    //... code here
  }
}

This will trigger them sequentially.

If you need to perform the requests in parallel, you can create an array of promises and use Promise.all or Promise.allSettled to await for them.

const promises = obj_list.Contents.map(async () => {
  const command = // ...
  const data = await client.send //... code here
}); // array of promises

await Promise.all(promises)

Upvotes: 0

Ankush Jain
Ankush Jain

Reputation: 7079

It seems, you are missing await before the send method.

await client.send(command);

Refer this example - List objects in an Amazon S3 bucket using an AWS SDK

Upvotes: 1

Related Questions