Reputation: 4871
I've noticed that boto3 takes about 3 times as long as boto2 to read the same objects from an S3 bucket. The Python script below illustrates the problem. My environment is Ubuntu 18.04, Python 3.7.9, boto 2.49.0, boto3 1.16.63.
The script uses 20 threads to read 1,000 objects from an S3 bucket. It takes 5 - 6 seconds using boto2, but 18 - 19 seconds using boto3.
I've tried various numbers of threads. I've tried setting max_concurrency
in the boto3 file transfer config. These things don't seem to make a difference.
Can anyone say why boto3 is so much slower, or how to make it faster?
#!/usr/bin/python -u
"""
This script compares the performance of boto2 and boto3 for reading 1,000 small objects from an S3 bucket.
You'll need to change the value of BUCKET_NAME to the name of a bucket to which the script has read/write access.
"""
import boto
import boto3
from tempfile import NamedTemporaryFile
from threading import Thread
import time
BUCKET_NAME = 'deleteme-steve'
bucket2 = boto.connect_s3().get_bucket(BUCKET_NAME)
s3_boto3 = boto3.client('s3')
# Create 1,000 test objects in an S3 bucket. Once the objects exist, this code can be commented..
with NamedTemporaryFile(mode='wt') as ntf:
ntf.write('This is a test')
ntf.flush()
for i in range(1000):
s3_boto3.upload_file(ntf.name, BUCKET_NAME, 'test{}'.format(i))
def read2(i):
for j in range(50 * i, 50 * (i + 1)):
k = bucket2.get_key('test{}'.format(j))
with NamedTemporaryFile() as ntf:
k.get_contents_to_file(ntf)
def read3(i):
for j in range(50 * i, 50 * (i + 1)):
with NamedTemporaryFile() as ntf:
s3_boto3.download_fileobj(BUCKET_NAME, 'test{}'.format(j), ntf)
for boto_version in [2, 3]:
threads = []
start_time = time.time()
for i in range(20):
t = Thread(target=read2 if boto_version == 2 else read3, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join()
print('boto {}: {} seconds'.format(boto_version, time.time() - start_time))
Upvotes: 0
Views: 605
Reputation: 4871
It turns out that the slowness of boto3 occurs when using Python 2 (which is no longer supported), but not Python 3. With Python 3, boto2 and boto3 have approximately equal speed in my test.
Upvotes: 1