ChrisFNZ
ChrisFNZ

Reputation: 617

AWS CLI S3 CP performance is painfully slow

I've got an issue whereby uploads to and downloads from AWS S3 via the aws cli are very slow. By very slow I mean it consistently takes around 2.3s for a 211k file which indicates an average download speed of less than 500Kb/s which is extremely slow for such a small file. My webapp is heavily reliant on internal APIs and I've narrowed down that the bulk of the API's round-trip performance is predominantly related to uploading and downloading files from S3.

Some details:

So to summarise:

I need to improve AWS CLI S3 download performance because the API is going to be quite heavily used in the future.

Upvotes: 12

Views: 23436

Answers (6)

yucer
yucer

Reputation: 5039

Sometimes the cause is the a DNS service returning broken AAAA records somewhere in the network path to the server. Those are IPv6 addresses that are tested without success before starting the IPv4 attempt. In such case you can use the dual-stack endpoints for S3 in your app. If you are using the CLI from your workstation using another DNS service might be the solution.

Upvotes: 0

Cornelius Roemer
Cornelius Roemer

Reputation: 7929

While my download speeds weren't as slow as yours, I managed to max out my ISPs download bandwidth with aws s3 cp by adding the following configuration to my ~/.aws/config:

[profile default]
s3 =
  max_concurrent_requests = 200
  max_queue_size = 5000
  multipart_threshold = 4MB
  multipart_chunksize = 4MB

If you don't want to edit the config file, you can probably use CLI parameters instead. Have a look at the documentation: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html

Upvotes: 9

Aditya
Aditya

Reputation: 1

You may try using boto3 to download files instead of aws s3 cp.

Refer to Downloading a File from an S3 Bucket

Upvotes: 0

MagicLAMP
MagicLAMP

Reputation: 1072

AWS S3 is slow and painfully complex and you can't easily search for files. If used with cloudfront, it is faster and there are supposed to be advantages, but complexity shifts from very complex to insanely complex because caching obfuscates any file changes, and invalidating the cache is hit and miss unless you change the file name which involves changing the file name in the page referencing that file.

In practice, particularly if all or most of your traffic is located in the same region as your load balancer, I have found even a low specced web server located in the same region is faster by factors of 10. If you need multiple web servers attached to a common volume, AWS only provides this in certain regions, so I got around this by using NFS to share the volume on multiple web servers. This gives you a file system that is mounted on a server you can log in to and list and find files. S3 has become a turnkey solution for a problem that was solved better a couple of decades ago.

Upvotes: 0

GChamon
GChamon

Reputation: 476

I found that if I try to download an object using aws s3 cp, the download would hang close to finishing when the object size is greater than 500MB.

However, using get-object directly causes no hang or slowdown whatsoever. Therefore instead of using

aws s3 cp s3://my-bucket/path/to/my/object .

getting the object with

aws s3api get-object --bucket my-bucket --key path/to/my/object out-file

I experience no slowdown.

Upvotes: 4

ChrisFNZ
ChrisFNZ

Reputation: 617

Okay this was a combination of things.

I'd had problems with the AWS PHP API SDK previously (mainly related to orphaned threads when copying files), so had changed my APIs to use the AWS CLI for simplicity and reliability reasons and although they worked, I encountered a few performance issues:

  • Firstly because my Instance had role based access to my S3 buckets, the aws CLI was taking around 1.7s just to determine which region my buckets were in. Configuring the CLI to point to a default region overcame this
  • Secondly because PHP has to invoke a whole new shell when running an exec() command (e.g. exec("aws s3 cp s3://bucketname/objectname.txt /var/app_path/objectname.txt)) that is a very slow exercise. I know it's possible to offload shell commands via Gearman or similar but since simplicity was one of my goals, I didn't want to go down that road
  • Finally because the AWS CLI uses Python, it takes almost 0.4s just to initiate, before it even begins processing a command. That might not seem like alot but when my API is in production usage it will be quite an impact to users and infrastructure alike

To cut a long story short, I've done two things:

  • Reverted to using the AWS PHP API SDK instead of the AWS CLI
  • Referring to the correct S3 region name within my PHP code

My APIs are now performing much better, i.e. From 2.3s to an average of around .07s.

This doesn't make my original issue go away but at least performance is much better.

Upvotes: 3

Related Questions