Sergio Tx
Sergio Tx

Reputation: 3868

Servlet reading from Amazon S3 so slow

I need a servlet to return files from Amazon S3 servers. Only the server has the credentials to access, the S3 bucket is not public. I cannot change that. I was told to use data streams, but they are so slow. To test, I have a small proyect with thumbnails and when you click on one it opens a new tab with the full image. A 5mb image takes about a minute to load. That slow.

The function that reads from S3 and returns the data stream:

public void downloadDirectlyFromS3(String s3Path, String fileName, HttpServletResponse response) {
    AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider());
    s3Client.setEndpoint(S3ENDPOINT);

    S3Object s3object = s3Client.getObject(new GetObjectRequest(s3Path, fileName));

    byte[] buffer = new byte[5 * 1024 * 1024];

    try {
        InputStream input = s3object.getObjectContent();
        ServletOutputStream output = response.getOutputStream();
        for (int length = 0; (length = input.read(buffer)) > 0;) {
            output.write(buffer, 0, length);
        }
        output.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

Upvotes: 2

Views: 2735

Answers (3)

John Mercier
John Mercier

Reputation: 1705

There are two things that stand out that may be the cause of the problem.

public void downloadDirectlyFromS3(String s3Path, String fileName, HttpServletResponse response) {
    AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider()); // 1. new client for each request
    s3Client.setEndpoint(S3ENDPOINT);

    S3Object s3object = s3Client.getObject(new GetObjectRequest(s3Path, fileName)); //may return null if not found

    byte[] buffer = new byte[5 * 1024 * 1024];

    try {
        InputStream input = s3object.getObjectContent(); // 2. input stream is never closed
        ServletOutputStream output = response.getOutputStream();
        for (int length = 0; (length = input.read(buffer)) > 0;) {
            output.write(buffer, 0, length);
        }
        output.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

The first change I would make is to create one client for the entire application and reuse it. This is probably the main cause of your issue. AWS clients are considered thread safe and can be used by multiple requests at the same time. The client handles connection pooling and reuse which will help speed up multiple requests.

The second change would be to correctly close resources. input is never closed and output is not closed on exceptions. Consider using try-with-resources.

try(InputStream input = s3object.getObjectContent(); ServletOutputStream output = response.getOutputStream();) {

} catch (FileNotFoundException e) {
    e.printStackTrace(); // never thrown. s3object will be null
} catch (IOException e) {
    e.printStackTrace(); // consider using a logger for exceptions
}

Also, according to the javadocs s3object will be null when the object is not found so you don't have to check for FileNotFoundException.

Another consideration is the endpoint seems to be hardcoded. If the application is running on an ec2 instance and your development machine is configured correctly you can simply use the defaultClient.

AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient();

The builder will lookup the endpoint for you.

When your application closes consider calling s3Client.shutdown().

For more information I found this useful.

Upvotes: 3

Sergio Tx
Sergio Tx

Reputation: 3868

I found the answer. The problem was the logger. We are using log4j and it was set to debug, so all the trace of the stream was written in the console. Just in case it happens to somebody else, here's the link where they say it should be avoided in production: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-logging.html#verbose-wire-logging

And I also started using the TransferManager as saravanakumar v said, seems to be slightly faster.

for example i had this ip-to-country.bin downloaded look at this sample response while debug was on

2017-05-23 12:06:21,770 Wire (Wire.java:86) DEBUG - http-outgoing-0 << "[0xf0][0x1]BGY[0xb][0x0][0x1]ITY[0xb][0x10][0x1]I"
2017-05-23 12:06:21,824 Wire (Wire.java:72) DEBUG - http-outgoing-0 << "RY[0xb] [0x1]CHY[0xb]0[0x1]DEY[0xb]@[0x1]KZY[0xb]P[0x1]ITY[0xb]`[0x1]ESY[0xb][0x80][0x1]PLY[0xb][0xa0][0x1]GEY[0xb][0xb0][0x1]TJY[0xb][0xc0][0x1]DEY[0xb][0xd0][0x1]CHY[0xb][0xe0][0x1]CZY[0xc][0x0][0x1]GBY[0xc][0x10][0x1]ESY[0xc] [0x1]RUY[0xc]0[0x1]SKY[0xc]@[0x1]RUY[0xc]P[0x1]UZY[0xc]`[0x1]RUY[0xc]p[0x1]MDY[0xc][0x80][0x1]ITY[0xc][0x90][0x1]GBY[0xc][0xa0][0x1]ITY[0xc][0xc0][0x1]UAY[0xc][0xe0][0x1]SAY[0xc][0xf0][0x1]RUY[\r][0x0][0x1]NOY[\r] [0x1]HUY[\r]0[0x1]FRY[\r]@[0x1]DEY[\r]P[0x1]ESY[\r]`[0x1]HUY[\r]p[0x1]ESY[\r][0x80][0x1]GBY[\r][0xa0][0x1]DEY[\r][0xb0][0x1]ATY[\r][0xc0][0x1]DEY[\r][0xd0][0x1]RUY[\r][0xe0][0x1]SEY[0xe][0x0][0x1]NOY[0xe][0x10][0x1]RUY[0xe] [0x1]ESY[0xe]0[0x1]RUY[0xe]@[0x1]CHY[0xe]P[0x1]NGY[0xe]`[0x1]AZY[0xe]p[0x1]DEY[0xe][0x80][0x1]GBY[0xe][0x90][0x1]DEY[0xe][0xb0][0x1]GBY[0xe][0xc0][0x1]RUY[0xe][0xd0][0x1]HRY[0xe][0xe0][0x1]ATY[0xe][0xf0][0x1]RUY[0xf][0x0][0x1]ATY[0xf][0x10][0x1]RUY[0xf] [0x1]ESY[0xf]0[0x1]RUY[0xf]@[0x1]GBY[0xf]P[0x1]FRY[0xf]`[0x1]MTY[0xf]p[0x1]GBY[0xf][0x80][0x1]RUY[0xf][0xa0][0x1]EUY[0xf][0xb0][0x1]KZY[0xf][0xc0][0x1]RUY[0xf][0xd0][0x1]ITY[0xf][0xe0][0x1]BEY[0xf][0xf0][0x1]SEY[0x10][0x0][0x1]FRY[0x10][0x10][0x1]RUY[0x10] [0x1]BEY[0x10]0[0x1]GBY[0x10]@[0x1]MKY[0x10]`[0x1]DKY[0x10]p[0x1]ATY[0x10][0x80][0x1]RSY[0x10][0x90][0x1]ESY[0x10][0xa0][0x1]DEY[0x10][0xb0][0x1]CZY[0x10][0xc0][0x1]SEY[0x10][0xd0][0x1]GBY[0x10][0xe0][0x1]CYY[0x10][0xf0][0x1]ESY[0x11][0x0][0x1]NOY[0x11][0x10][0x1]DEY[0x11] [0x1]PLY[0x11]0[0x1]BGY[0x11]@[0x1]SEY[0x11]P[0x1]LTY[0x11]`[0x1]RSY[0x11]p[0x1]RUY[0x11][0x80][0x1]NLY[0x11][0x90][0x1]TRY[0x11][0xa0][0x1]RUY[0x11]

Upvotes: 1

Related Questions