Ketan Chaudhari
Ketan Chaudhari

Reputation: 433

Download file from URL and upload it to AWS S3 without saving into memory using AWS SDK for Java, version 2

I am writing a code that will download a file from URL and upload it to S3, but I don't want it to be stored temporarily in file or memory, I am downloading through 'InputStream' but AWS s3 requires the file size which I don't have from 'InputStream' is there any other way. I found the this discussion on same topic using 'Node.js'


My Code to Fetch the file in inputStream


HttpClient client = HttpClient.newBuilder().build();
URI uri = URI.create("{myUrl}");
HttpRequest request = HttpRequest.newBuilder().uri(uri).build();
InputStream is = client.send(request, HttpResponse.BodyHandlers.ofInputStream()).body();

Code I tried to insert into S3, but I don't have content_length


S3Client s3Client = S3Client.builder().build();
PutObjectRequest objectRequest = PutObjectRequest.builder()
                            .bucket(BUCKET_NAME)
                            .key(KEY)
                            .build();

PutObjectResponse por = s3Client.putObject(objectRequest, RequestBody.fromInputStream(is,content_length));

Upvotes: 5

Views: 6043

Answers (2)

Parsifal
Parsifal

Reputation: 4516

You have a few options.

The easiest is to retain the HttpResponse from your client.send(), and get the Content-Length header from it. You should also be looking for headers like Content-Type, and storing them as metadata on the S3 object.

That isn't guaranteed to work in all cases: some servers do not provide Content-Length. In that case you need to create a multipart upload to send the file. When doing this, you buffer relatively small chunks (minimum 5 MB) in memory but can upload up to 10,000 chunks. You must either complete or abort the upload, or configure your bucket to delete uncompleted uploads after a certain period of time; if not, you'll be charged for incomplete uploads.

A third alternative is to use the V1 SDK, which gives you TransferManager. That handles the multi-part upload for you, and uses multiple threads to improve bandwidth. But it still hasn't been implemented for V2.

Upvotes: 1

smac2020
smac2020

Reputation: 10734

"Code I tried to insert into S3, but I don't have content_length"

To get around having to have content length - instead of using an InputStream that does require the content length, you can use a btye[], as described here.

https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/core/sync/RequestBody.html#fromBytes-byte:A-

Getting a byte array depends on the application you are building. For example, some apps, the byte array can be created from a file being posted to a web app. Other apps, a byte array can be created from a file that is read at a certain location. The point here is your app has to get a byte array somehow and use that data to upload content to an S3 bucket.

If you app has an InputStream (which it seems you have based on your thread description), convert that to a byte[] using Java logic. Once you have the byte[], you can call putObject, as shown here.

public String putObject(byte[] data, String bucketName, String objectKey) {

        s3 = getClient();

        try {
            //Put a file into the bucket
            PutObjectResponse response = s3.putObject(PutObjectRequest.builder()
                            .bucket(bucketName)
                            .key(objectKey)
                            .build(),
                    RequestBody.fromBytes(data));

            return response.eTag();

        } catch (S3Exception e) {
            System.err.println(e.getMessage());
            System.exit(1);
        }
        return "";
    }

Upvotes: 1

Related Questions