Reputation: 617
I've got an issue whereby uploads to and downloads from AWS S3 via the aws cli are very slow. By very slow I mean it consistently takes around 2.3s for a 211k file which indicates an average download speed of less than 500Kb/s which is extremely slow for such a small file. My webapp is heavily reliant on internal APIs and I've narrowed down that the bulk of the API's round-trip performance is predominantly related to uploading and downloading files from S3.
Some details:
So to summarise:
I need to improve AWS CLI S3 download performance because the API is going to be quite heavily used in the future.
Upvotes: 12
Views: 23436
Reputation: 5039
Sometimes the cause is the a DNS service returning broken AAAA records somewhere in the network path to the server. Those are IPv6 addresses that are tested without success before starting the IPv4 attempt. In such case you can use the dual-stack endpoints for S3 in your app. If you are using the CLI from your workstation using another DNS service might be the solution.
Upvotes: 0
Reputation: 7929
While my download speeds weren't as slow as yours, I managed to max out my ISPs download bandwidth with aws s3 cp
by adding the following configuration to my ~/.aws/config
:
[profile default]
s3 =
max_concurrent_requests = 200
max_queue_size = 5000
multipart_threshold = 4MB
multipart_chunksize = 4MB
If you don't want to edit the config file, you can probably use CLI parameters instead. Have a look at the documentation: https://docs.aws.amazon.com/cli/latest/topic/s3-config.html
Upvotes: 9
Reputation: 1
You may try using boto3 to download files instead of aws s3 cp
.
Refer to Downloading a File from an S3 Bucket
Upvotes: 0
Reputation: 1072
AWS S3 is slow and painfully complex and you can't easily search for files. If used with cloudfront, it is faster and there are supposed to be advantages, but complexity shifts from very complex to insanely complex because caching obfuscates any file changes, and invalidating the cache is hit and miss unless you change the file name which involves changing the file name in the page referencing that file.
In practice, particularly if all or most of your traffic is located in the same region as your load balancer, I have found even a low specced web server located in the same region is faster by factors of 10. If you need multiple web servers attached to a common volume, AWS only provides this in certain regions, so I got around this by using NFS to share the volume on multiple web servers. This gives you a file system that is mounted on a server you can log in to and list and find files. S3 has become a turnkey solution for a problem that was solved better a couple of decades ago.
Upvotes: 0
Reputation: 476
I found that if I try to download an object using aws s3 cp
, the download would hang close to finishing when the object size is greater than 500MB.
However, using get-object
directly causes no hang or slowdown whatsoever. Therefore instead of using
aws s3 cp s3://my-bucket/path/to/my/object .
getting the object with
aws s3api get-object --bucket my-bucket --key path/to/my/object out-file
I experience no slowdown.
Upvotes: 4
Reputation: 617
Okay this was a combination of things.
I'd had problems with the AWS PHP API SDK previously (mainly related to orphaned threads when copying files), so had changed my APIs to use the AWS CLI for simplicity and reliability reasons and although they worked, I encountered a few performance issues:
To cut a long story short, I've done two things:
My APIs are now performing much better, i.e. From 2.3s to an average of around .07s.
This doesn't make my original issue go away but at least performance is much better.
Upvotes: 3