I've got an issue whereby uploads to and downloads from AWS S3 via the aws cli are very slow. By very slow I mean it consistently takes around 2.3s for a 211k file which indicates an average download speed of less than 500Kb/s which is extremely slow for such a small file. My webapp is heavily reliant on internal APIs and I've narrowed down that the bulk of the API's round-trip performance is predominantly related to uploading and downloading files from S3. Some details: Using the latest version of aws cli (aws-cli/1.14.44 Python/3.6.6, Linux/4.15.0-34-generic botocore/1.8.48) on an AWS hosted EC2 instance Instance is running the latest version of Ubuntu (18.04) Instance is in region ap-southeast-2a (Sydney) Instance is granted role based access to S3 via a least privilege policy (i.e. minimum rights to the buckets that it needs access to) Type is t2.micro which should have Internet Bandwidth of ~60Mb or so S3 buckets are in ap-southeast-2 Same result with encrypted (default) and unencrypted files Same result with files regardless of whether they have a random collection of alpha numeric characters in the object name The issue persists consistently, even after multiple cp attempts and after a reboot the cp attempt consistently takes 2.3s This leads me to wonder whether S3 or the EC2 instance (which is using a standard Internet Gateway) is throttled back I've tested downloading the same file from the same instance to a webserver using wget and it takes 0.0008s (i.e. 8ms) So to summarise: Downloading the file from S3 via the AWS CLI takes 2.3s (i.e. 2300ms) Downloading the same file from a webserver (> Internet > Cloudflare > AWS > LB > Apache) via wget takes 0.0008s (i.e. 8ms) I need to improve AWS CLI S3 download performance because the API is going to be quite heavily used in the future.

amazon-web-servicesamazon-s3downloadaws-cli

Reputation: 617

AWS CLI S3 CP performance is painfully slow

I've got an issue whereby uploads to and downloads from AWS S3 via the aws cli are very slow. By very slow I mean it consistently takes around 2.3s for a 211k file which indicates an average download speed of less than 500Kb/s which is extremely slow for such a small file. My webapp is heavily reliant on internal APIs and I've narrowed down that the bulk of the API's round-trip performance is predominantly related to uploading and downloading files from S3.

Some details:

Using the latest version of aws cli (aws-cli/1.14.44 Python/3.6.6, Linux/4.15.0-34-generic botocore/1.8.48) on an AWS hosted EC2 instance
Instance is running the latest version of Ubuntu (18.04)
Instance is in region ap-southeast-2a (Sydney)
Instance is granted role based access to S3 via a least privilege policy (i.e. minimum rights to the buckets that it needs access to)
Type is t2.micro which should have Internet Bandwidth of ~60Mb or so
S3 buckets are in ap-southeast-2
Same result with encrypted (default) and unencrypted files
Same result with files regardless of whether they have a random collection of alpha numeric characters in the object name
The issue persists consistently, even after multiple cp attempts and after a reboot the cp attempt consistently takes 2.3s
This leads me to wonder whether S3 or the EC2 instance (which is using a standard Internet Gateway) is throttled back
I've tested downloading the same file from the same instance to a webserver using wget and it takes 0.0008s (i.e. 8ms)

So to summarise:

Downloading the file from S3 via the AWS CLI takes 2.3s (i.e. 2300ms)
Downloading the same file from a webserver (> Internet > Cloudflare > AWS > LB > Apache) via wget takes 0.0008s (i.e. 8ms)

I need to improve AWS CLI S3 download performance because the API is going to be quite heavily used in the future.

Upvotes: 12

Answers (6)

yucer

Reputation: 5039

Sometimes the cause is the a DNS service returning broken AAAA records somewhere in the network path to the server. Those are IPv6 addresses that are tested without success before starting the IPv4 attempt. In such case you can use the dual-stack endpoints for S3 in your app. If you are using the CLI from your workstation using another DNS service might be the solution.

Upvotes: 0

Cornelius Roemer

Reputation: 7929

While my download speeds weren't as slow as yours, I managed to max out my ISPs download bandwidth with aws s3 cp by adding the following configuration to my ~/.aws/config:

[profile default]
s3 =
  max_concurrent_requests = 200
  max_queue_size = 5000
  multipart_threshold = 4MB
  multipart_chunksize = 4MB

If you don't want to edit the config file, you can probably use CLI parameters instead. Have a look at the documentation: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html

Upvotes: 9

Aditya

Reputation: 1

You may try using boto3 to download files instead of aws s3 cp.

Refer to Downloading a File from an S3 Bucket

Upvotes: 0

MagicLAMP

Reputation: 1072

AWS S3 is slow and painfully complex and you can't easily search for files. If used with cloudfront, it is faster and there are supposed to be advantages, but complexity shifts from very complex to insanely complex because caching obfuscates any file changes, and invalidating the cache is hit and miss unless you change the file name which involves changing the file name in the page referencing that file.

In practice, particularly if all or most of your traffic is located in the same region as your load balancer, I have found even a low specced web server located in the same region is faster by factors of 10. If you need multiple web servers attached to a common volume, AWS only provides this in certain regions, so I got around this by using NFS to share the volume on multiple web servers. This gives you a file system that is mounted on a server you can log in to and list and find files. S3 has become a turnkey solution for a problem that was solved better a couple of decades ago.

Upvotes: 0

GChamon

Reputation: 476

I found that if I try to download an object using aws s3 cp, the download would hang close to finishing when the object size is greater than 500MB.

However, using get-object directly causes no hang or slowdown whatsoever. Therefore instead of using

aws s3 cp s3://my-bucket/path/to/my/object .

getting the object with

aws s3api get-object --bucket my-bucket --key path/to/my/object out-file

I experience no slowdown.

Upvotes: 4

ChrisFNZ

Reputation: 617

Okay this was a combination of things.

I'd had problems with the AWS PHP API SDK previously (mainly related to orphaned threads when copying files), so had changed my APIs to use the AWS CLI for simplicity and reliability reasons and although they worked, I encountered a few performance issues:

Firstly because my Instance had role based access to my S3 buckets, the aws CLI was taking around 1.7s just to determine which region my buckets were in. Configuring the CLI to point to a default region overcame this
Secondly because PHP has to invoke a whole new shell when running an exec() command (e.g. exec("aws s3 cp s3://bucketname/objectname.txt /var/app_path/objectname.txt)) that is a very slow exercise. I know it's possible to offload shell commands via Gearman or similar but since simplicity was one of my goals, I didn't want to go down that road
Finally because the AWS CLI uses Python, it takes almost 0.4s just to initiate, before it even begins processing a command. That might not seem like alot but when my API is in production usage it will be quite an impact to users and infrastructure alike

To cut a long story short, I've done two things:

Reverted to using the AWS PHP API SDK instead of the AWS CLI
Referring to the correct S3 region name within my PHP code

My APIs are now performing much better, i.e. From 2.3s to an average of around .07s.

This doesn't make my original issue go away but at least performance is much better.

Upvotes: 3

AWS CLI S3 CP performance is painfully slow

Answers (6)

Related Questions