gmoss
gmoss

Reputation: 989

botocore >= 1.28.0 slower in multithread application

The official Boto3 docs recommends creating a new resource per thread: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html#multithreading-or-multiprocessing-with-resources

Botocore 1.28.0 merged a feature which appears to generate a list of all possible endpoints on resource creation: https://github.com/boto/botocore/pull/2785

I have a test suite which uses motoserver and an application that relies heavily on parallelized downloads from / uploads to s3 from a process pool. With botocore 1.28.0, the test suite takes an extra 20 minutes to run as compared to the previous version.

I've done some work with cProfile and I can confirm that at least half of the additional time is spent inside of botocore's load_service_model method called during botocore client creation. Haven't tracked down the other ~50% of extra time yet but it's somewhere in botocore usage.

What can I do to speed this up again with the version bump?

Upvotes: 1

Views: 162

Answers (1)

trhr
trhr

Reputation: 148

Use a single pre-loaded loader instance, e.g.

from botocore.loaders import Loader

preloader = Loader()

for type_name in frozenset(['endpoint-rule-set-1, paginators-1']):
  preloader.load_service_model(service_name='s3', type_name=type_name)

session_lock = threading.Lock()

def _session():
  session = botocore.session.get_session()
  session.register_component('data_loader', preloader)
  with session_lock:
    return boto3.session.Session(botocore_session=session)

Then in your threads you can use:

session = _session()
resource = session.resource(...)

Upvotes: 1

Related Questions