Reputation: 1484
We generate PDF files on a background worker, on a Rails app hosted at Heroku, once generated they are uploaded to Amazon S3. Both Heroku app and S3 bucket are located in eu-west-1 zone.
We are experiencing veeery slow upload, altghough very basic ans small files. Look at this example:
Aws.config.update({
region: 'eu-west-1',
credentials: Aws::Credentials.new(ENV['S3_USER_KEY'], ENV['S3_USER_SECRET'])
})
S3_BUCKET = Aws::S3::Resource.new.bucket(ENV['S3_PRIVATE_BUCKET'])
file = Tempfile.new(["testfile", ".pdf"], encoding: "ascii-8bit").tap do |file|
file.write("a"*5000)
end
Benchmark.bm do |x|
x.report { S3_BUCKET.put_object(key: "testfile.pdf", body: file) }
end
user system total real
0.020000 0.040000 0.060000 ( 40.499553)
I think I cannot make a simpler example, so sending a file with 5000 characters takes 40 seconds to be uploaded to S3 from a Heroku one-off instance.
Please note that I tested on both my (domestic) internet connection and Heroku instance, results are almost similar. On the other side I'm using ForkLift.app as a GUI to browse my buckets, and uploading a file is almost instantaneous.
I've been browsing through aws-sdk documentation and I couldn't see anything to explain such a slow upload
Upvotes: 2
Views: 2791
Reputation: 1
My solution:
Benchmark.bm do |x|
file.seek(0) # add this
x.report { S3_BUCKET.put_object(key: "testfile.pdf", body: file) }
end
# user system total real
# 0.003022 0.000740 0.003762 (0.018425)
Upvotes: 0
Reputation: 1
I think you missing content_type in put_object.
S3_BUCKET.put_object(key: "testfile.pdf", body: file, content_type: "application/pdf")
i have same test performance with this, it'll increase from 20s to 2ms for a pdf file upload.
Hope this help.
Upvotes: 0
Reputation: 43
I ran into a similar issue when creating an Aws::S3::Object
and using the method upload_file
. If I passed in a TempFile
object, uploading even a small file (~5KB) took ~40 seconds. However, passing in TempFile.path
was stupendously faster (less than 1 second).
You may have needed to use the AWS::S3::Bucket
method put_object
for your own reasons, but put_object
seems to only accept a String
or IO
, not a File
or TempFile
path. If you could refactor to create an AWS::S3::Object
and use upload_file
you could use this workaround.
require 'aws-sdk-s3'
s3_resource = Aws::S3::Resource.new(region: 'us-east-2')
file = Tempfile.new(["testfile", ".pdf"], encoding: "ascii-8bit").tap do |file|
file.write("a"*5000)
end
Benchmark.bm do |x|
x.report {
obj = s3_resource.bucket('mybucket').object("testfile-as-object.pdf")
#passing the TempFile object is quite slow
obj.upload_file(file)
}
end
# user system total real
# 0.010359 0.006335 0.016694 ( 41.175544)
Benchmark.bm do |x|
x.report {
obj = s3_resource.bucket('mybucket').object("testfile-as-path.pdf")
#passing the TempFile object's path is massively faster than passing the TempFile object itself
obj.upload_file(file.path)
}
end
# user system total real
# 0.004573 0.002032 0.006605 ( 0.398605)
Upvotes: 0
Reputation: 659
It seems to be a problem with put_object and TempFile
Try passing the file to IO first
new_file = IO.read(file)
S3_BUCKET.put_object(key: "testfile.pdf", body: new_file)
Upvotes: 1
Reputation: 1484
It seems like the AwsSdk is the culprit. I tested with other ways to upload the same file :
(I was on a cellphone connection, so the network was really slow, and I did not take the time to install/configure aws CLI on a Heroku Dyno)
Benchmark.bm do |x|
x.report { `aws s3 cp #{file.path} s3://#{ ENV['S3_BUCKET']}/testfile.pdf` }
end
0.000000 0.000000 0.510000 ( 2.486112)
This was run from a Heroku Dyno.
connection = Fog::Storage.new({
:provider => 'AWS',
:aws_access_key_id => ENV['S3_USER_KEY'],
:aws_secret_access_key => ENV['S3_USER_SECRET'],
region: "eu-west-1"
})
directory = connection.directories.new(key: ENV["S3_BUCKET"], region: "eu-west-1")
Benchmark.bmb do |x|
x.report do
directory.files.create(
:key => 'test-with-fog.pdf',
:body => file,
)
end
end
user system total real
0.010000 0.010000 0.020000 ( 0.050712)
I will stick to the latest as a workaround. Still, I did not find the reason causing such slowliness with aws-sdk.
Upvotes: 0