Reputation: 461
Is boto3 low level client for S3 thread-safe? Documentation is not explicit about it.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#client
A similar issue is discussed in Github
https://github.com/boto/botocore/issues/1246
But still there is no answer from maintainers.
Upvotes: 46
Views: 49965
Reputation: 3857
This was answered by the boto team on May 19, 2021. See source docs here.
Resource instances are not thread safe and should not be shared across threads or processes. These special classes contain additional meta data that cannot be shared. It's recommended to create a new Resource for each thread or process:
import boto3
import boto3.session
import threading
class MyTask(threading.Thread):
def run(self):
# Here we create a new session per thread
session = boto3.session.Session()
# Next, we create a resource client using our thread's session object
s3 = session.resource('s3')
# Put your thread-safe code here
In the example above, each thread would have its own Boto3 session and its own instance of the S3 resource. This is a good idea because resources contain shared data when loaded and calling actions, accessing properties, or manually loading or reloading the resource can modify this data.
Upvotes: 5
Reputation: 7808
If you take a look at the Multithreading/Processing documentation for boto3 you can see that they recommend one client per session as there is shared data between instance that can be mutated by individual threads.
It also looks like there's an open GitHub issue for this exact question. https://github.com/boto/botocore/issues/1246
Upvotes: 34
Reputation: 141
You can successfully create multiple threads, but you have to instantiate a new session per thread/process and thereby can asynchronously download from an S3 bucket for example.
An example below:
import concurrent.futures
import boto3
import json
files = ["path-to-file.json", "path-to-file2.json"]
def download_from_s3(file_path):
# setup a new session
sess = boto3.session.Session()
client = sess.client("s3")
# download a file
obj = client.get_object(Bucket="<your-bucket>", Key=file_path)
resp = json.loads(obj["Body"].read())
return resp
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(download_from_s3, files)
Upvotes: 3
Reputation: 361
From documentation:
Low-level clients are thread safe. When using a low-level client, it is recommended to instantiate your client then pass that client object to each of your threads.
Instantiation of the client is not thread safe while an instance is. To make things work in a multi-threaded environment, put instantiation in a global Lock like this:
boto3_client_lock = threading.Lock()
def create_client():
with boto3_client_lock:
return boto3.client('s3', aws_access_key_id='your key id', aws_secret_access_key='your access key')
Upvotes: 26
Reputation: 219
I recently tried using the single boto client instance using concurrent.futures.ThreadPoolExecutor
. I run into exceptions coming from boto. I assume the boto client is not thread safe in this case.
The exception I got
File "xxx/python3.7/site-packages/boto3/session.py", line 263, in client
aws_session_token=aws_session_token, config=config)
File "xxx/python3.7/site-packages/botocore/session.py", line 827, in create_client
endpoint_resolver = self._get_internal_component('endpoint_resolver')
File "xxx/python3.7/site-packages/botocore/session.py", line 694, in _get_internal_component
return self._internal_components.get_component(name)
File "xxx/python3.7/site-packages/botocore/session.py", line 906, in get_component
del self._deferred[name]
Upvotes: 16